Pular para o conteúdo principal

Pré-requisitos

Instalar o SDK

pip install firecrawl-py
Adicione sua chave de API às configurações do Django ou às variáveis de ambiente:
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY

Criar views

Adicione views para busca, scraping e interação ao seu app Django. Em views.py:
import json
import os
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from django.views.decorators.http import require_POST
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])


@csrf_exempt
@require_POST
def search_view(request):
    body = json.loads(request.body)
    results = firecrawl.search(body["query"], limit=body.get("limit", 5))
    return JsonResponse(
        [{"title": r.title, "url": r.url} for r in results.web],
        safe=False,
    )


@csrf_exempt
@require_POST
def scrape_view(request):
    body = json.loads(request.body)
    result = firecrawl.scrape(body["url"])
    return JsonResponse({
        "markdown": result.markdown,
        "metadata": result.metadata,
    })


@csrf_exempt
@require_POST
def interact_start_view(request):
    body = json.loads(request.body)
    result = firecrawl.scrape(body["url"], formats=["markdown"])
    return JsonResponse({"scrape_id": result.metadata.scrape_id})


@csrf_exempt
@require_POST
def interact_view(request):
    body = json.loads(request.body)
    response = firecrawl.interact(body["scrape_id"], prompt=body["prompt"])
    return JsonResponse({"output": response.output})


@csrf_exempt
@require_POST
def interact_stop_view(request):
    body = json.loads(request.body)
    firecrawl.stop_interaction(body["scrape_id"])
    return JsonResponse({"status": "stopped"})

Defina as URLs

Em urls.py:
from django.urls import path
from . import views

urlpatterns = [
    path("api/search/", views.search_view),
    path("api/scrape/", views.scrape_view),
    path("api/interact/start/", views.interact_start_view),
    path("api/interact/", views.interact_view),
    path("api/interact/stop/", views.interact_stop_view),
]

Experimente

python manage.py runserver

# Fazer uma busca na web
curl -X POST http://localhost:8000/api/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "firecrawl web scraping", "limit": 5}'

# Fazer scraping de uma página
curl -X POST http://localhost:8000/api/scrape/ \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Iniciar uma sessão interativa
curl -X POST http://localhost:8000/api/interact/start/ \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.amazon.com"}'

Comando de gerenciamento

Use o Firecrawl em um comando de gerenciamento do Django para scripts e pipelines de dados. Crie management/commands/scrape.py:
import os
from django.core.management.base import BaseCommand
from firecrawl import Firecrawl


class Command(BaseCommand):
    help = "Scrape a URL and print the markdown"

    def add_arguments(self, parser):
        parser.add_argument("url", type=str)

    def handle(self, *args, **options):
        firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
        result = firecrawl.scrape(options["url"])
        self.stdout.write(result.markdown)
python manage.py scrape https://example.com

Próximos passos

Documentação de scraping

Todas as opções de scraping, incluindo formatos, ações e proxies

Documentação de busca

Faça uma busca na web e obtenha o conteúdo completo da página

Documentação de interação

Clique, preencha formulários e extraia conteúdo dinâmico

Referência do SDK Python

Referência completa do SDK com rastreamento, map, async e muito mais