跳转到主要内容

前提条件

安装 SDK

dependencies {
    implementation("com.firecrawl:firecrawl-java:1.2.0")
}

进行网页搜索

import com.firecrawl.client.FirecrawlClient;
import com.firecrawl.models.SearchData;
import com.firecrawl.models.SearchOptions;

public class Main {
    public static void main(String[] args) {
        FirecrawlClient client = FirecrawlClient.builder()
            .apiKey("fc-YOUR-API-KEY")
            .build();

        SearchData results = client.search(
            "firecrawl web scraping",
            SearchOptions.builder().limit(5).build()
        );

        if (results.getWeb() != null) {
            for (var result : results.getWeb()) {
                System.out.println(result.get("title") + " - " + result.get("url"));
            }
        }
    }
}

抓取网页

import com.firecrawl.models.Document;

Document doc = client.scrape("https://example.com");
System.out.println(doc.getMarkdown());
{
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "metadata": {
    "title": "Example Domain",
    "sourceURL": "https://example.com"
  }
}

与页面交互

打开一个浏览器会话,在其中运行 Playwright 代码,并在完成后将其关闭:
import com.firecrawl.models.ScrapeOptions;
import com.firecrawl.models.BrowserExecuteResponse;
import java.util.List;

Document doc = client.scrape("https://www.amazon.com",
    ScrapeOptions.builder().formats(List.of((Object) "markdown")).build());
String scrapeId = (String) doc.getMetadata().get("scrapeId");

BrowserExecuteResponse run = client.interact(scrapeId,
    "const title = await page.title(); console.log(title);");
System.out.println(run.getStdout());

client.stopInteractiveBrowser(scrapeId);

环境变量

不要直接传入 apiKey,请改为设置 FIRECRAWL_API_KEY 环境变量:
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
FirecrawlClient client = FirecrawlClient.fromEnv();

后续步骤

搜索文档

进行网页搜索并获取完整页面内容

抓取文档

包含 formats、actions 和代理在内的所有抓取选项

交互文档

点击、填写表单并提取动态内容

Java SDK 参考

完整的 SDK 参考,涵盖爬取、map、批量抓取等