Introducing LLMs.txt Generator Endpoint (Alpha) 📃

The /llmstxt endpoint allows you to transform any website into clean, LLM-ready text files. Simply provide a URL, and Firecrawl will crawl the site and generate both llms.txt and llms-full.txt files that can be used for training or analysis with any LLM.

How It Works

The LLMs.txt Generator:

  1. Crawls the provided website URL and its linked pages
  2. Extracts clean, meaningful text content
  3. Generates two formats:
    • llms.txt: Concise summaries and key information
    • llms-full.txt: Complete text content with more detail

Example Usage

from firecrawl import FirecrawlApp

# Initialize the client
firecrawl = FirecrawlApp(api_key="your_api_key")

# Define generation parameters
params = {
    "maxUrls": 2,  # Maximum URLs to analyze
    "showFullText": True  # Include full text in results
}

# Generate LLMs.txt with polling
results = firecrawl.generate_llms_text(
    url="https://example.com",
    params=params
)

# Access generation results
if results['success']:
    print(f"Status: {results['status']}")
    print(f"Generated Data: {results['data']}")
else:
    print(f"Error: {results.get('error', 'Unknown error')}")

Key Parameters:

  • url: The website URL to generate LLMs.txt files from
  • maxUrls (Optional): Maximum number of pages to crawl (1-100, default: 10)
  • showFullText (Optional): Generate llms-full.txt in addition to llms.txt (default: false)

See API Reference for more details.

Checking Generation Status

LLMs.txt generation runs asynchronously. Make the aync call and monitor the status with:

from firecrawl import FirecrawlApp

# Initialize the client
firecrawl = FirecrawlApp(api_key="your_api_key")

# Create async job
job = firecrawl.async_generate_llms_text(
    url="https://example.com",
)

if job['success']:
    job_id = job['id']

# Check LLMs.txt generation status
status = firecrawl.check_generate_llms_text_status("job_id")

# Print current status
print(f"Status: {status['status']}")

if status['status'] == 'completed':
    print("LLMs.txt Content:", status['data']['llmstxt'])
    if 'llmsfulltxt' in status['data']:
        print("Full Text Content:", status['data']['llmsfulltxt'])
    print(f"Processed URLs: {len(status['data']['processedUrls'])}")

Status Examples

In Progress

{
  "success": true,
  "data": {
    "llmstxt": "# Firecrawl.dev llms.txt\n\n- [Web Data Extraction Tool](https://www.firecrawl.dev/)...",
    "llmsfulltxt": "# Firecrawl.dev llms-full.txt\n\n"
  },
  "status": "processing",
  "expiresAt": "2025-03-03T23:19:18.000Z"
}

Completed

{
  "success": true,
  "data": {
    "llmstxt": "# http://firecrawl.dev llms.txt\n\n- [Web Data Extraction Tool](https://www.firecrawl.dev/): Transform websites into clean, LLM-ready data effortlessly.\n- [Flexible Web Scraping Pricing](https://www.firecrawl.dev/pricing): Flexible pricing plans for web scraping and data extraction.\n- [Web Scraping and AI](https://www.firecrawl.dev/blog): Explore tutorials and articles on web scraping and AI...",
    "llmsfulltxt": "# http://firecrawl.dev llms-full.txt\n\n## Web Data Extraction Tool\nIntroducing /extract - Get web data with a prompt [Try now](https://www.firecrawl.dev/extract)\n\n[💥Get 2 months free with yearly plan](https://www.firecrawl.dev/pricing)..."
  },
  "status": "completed",
  "expiresAt": "2025-03-03T22:45:50.000Z"
}

Known Limitations (Alpha)

  1. Access Restrictions
    Only publicly accessible pages can be processed. Login-protected or paywalled content is not supported.

  2. Site Size
    We are only are allowing processing for up to 5000 URLs during the alpha stage.

  3. Alpha State
    As an Alpha feature, the output format and processing may evolve based on feedback.

Billing and Usage

Billing is based on the number of URLs processed:

  • Base cost: 1 credit per URL processed
  • Control URL costs with maxUrls parameter

Have feedback or need help? Email help@firecrawl.dev.