Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt

Use this file to discover all available pages before exploring further.

Canonical Firecrawl Java source of truth for agents. Generated from SDK source and the v2 OpenAPI spec.

Install

Maven:
<dependency>
  <groupId>com.firecrawl</groupId>
  <artifactId>firecrawl-java</artifactId>
  <version>1.2.0</version>
</dependency>
Gradle:
implementation("com.firecrawl:firecrawl-java:1.2.0")

Authenticate

import com.firecrawl.client.FirecrawlClient;

FirecrawlClient client = FirecrawlClient.builder()
    .apiKey(System.getenv("FIRECRAWL_API_KEY"))
    .build();

When To Use What

  • search: use when you start with a query and need discovery.
  • scrape: use when you already have a URL and want page content.
  • interact: use when the page needs clicks, forms, or post-scrape browser actions.

Why use it

Use search to discover relevant pages from a query, then pick URLs to scrape or interact with. You can constrain results to a site with site:, for example site:docs.firecrawl.dev crawl webhooks.

Preferred SDK methods

  • client.search(query)
  • client.search(query, options)

Return value

search returns SearchData. Read result buckets with getWeb(), getNews(), and getImages() (each is List<Map<String, Object>> and may be null). Do not treat SearchData as a directly iterable list of hits.

Simple Example

import com.firecrawl.models.SearchData;
import java.util.List;
import java.util.Map;

SearchData results = client.search("site:docs.firecrawl.dev webhook retries");
List<Map<String, Object>> web = results.getWeb();

Complex Example

import com.firecrawl.models.SearchOptions;
import com.firecrawl.models.ScrapeOptions;
import com.firecrawl.models.JsonFormat;
import com.firecrawl.models.LocationConfig;

SearchOptions options = SearchOptions.builder()
    .sources(List.of("web", "news"))
    .categories(List.of("research"))
    .limit(10)
    .tbs("qdr:m")
    .location("San Francisco,California,United States")
    .ignoreInvalidURLs(true)
    .timeout(60000)
    .scrapeOptions(
        ScrapeOptions.builder()
            .formats(List.of(
                "markdown",
                "links",
                JsonFormat.builder().prompt("Extract title and key endpoints.").build()
            ))
            .onlyMainContent(true)
            .waitFor(1000)
            .build()
    )
    .build();

SearchData results = client.search("site:docs.firecrawl.dev crawl webhooks", options);

Parameters

  • query
    • Type: String
    • Use when: you need a search query.
    • Notes: use site:example.com to limit results to a domain.
  • options.sources
    • Type: List of source strings or maps
    • Use when: you want to control which sources are searched.
    • Confirmed values:
      • "web": web index results
      • "news": news results
      • "images": image results
      • {type: "web" | "news" | "images"}: typed source map form
  • options.categories
    • Type: List of category strings or maps
    • Use when: you want to filter results by category.
    • Confirmed values:
      • "github": GitHub-focused results
      • "research": research and academic results
      • "pdf": PDF-focused results
      • {type: "github" | "research" | "pdf"}: typed category map form
  • options.limit
    • Type: Integer
    • Use when: you want to cap results.
  • options.tbs
    • Type: String
    • Use when: you need a time-based filter (for example qdr:d, qdr:w, sbd:1,qdr:m).
  • options.location
    • Type: String
    • Use when: you want localized results.
  • options.ignoreInvalidURLs
    • Type: Boolean
    • Use when: you want to drop URLs that cannot be scraped by other endpoints.
  • options.timeout
    • Type: Integer
    • Use when: you need a request timeout in milliseconds.
  • options.scrapeOptions
    • Type: ScrapeOptions
    • Use when: you want to scrape each search result (see Scrape parameters for fields).
  • options.integration
    • Type: String
    • Use when: the API expects an integration identifier on the request.

Scrape

Why use it

Use scrape when you already have a URL and want structured content in one or more formats.

Preferred SDK methods

  • client.scrape(url)
  • client.scrape(url, options)

Return value

scrape returns Document. Typical getters include getMarkdown(), getHtml(), getRawHtml(), getJson(), getMetadata(), getLinks(), and additional fields when the corresponding formats are requested.

Simple Example

Document doc = client.scrape(
    "https://docs.firecrawl.dev",
    ScrapeOptions.builder().formats(List.of("markdown")).build()
);

Complex Example

import com.firecrawl.models.ScrapeOptions;
import com.firecrawl.models.JsonFormat;

List<Map<String, Object>> actions = List.of(
    Map.of("type", "click", "selector", "#accept"),
    Map.of("type", "wait", "milliseconds", 750),
    Map.of("type", "scrape")
);

LocationConfig location = LocationConfig.builder()
    .country("US")
    .languages(List.of("en-US"))
    .build();

ScrapeOptions options = ScrapeOptions.builder()
    .formats(List.of(
        "markdown",
        "links",
        JsonFormat.builder().prompt("Extract plan names and prices.").build(),
        Map.of("type", "screenshot", "fullPage", true, "quality", 80)
    ))
    .onlyMainContent(true)
    .waitFor(1000)
    .parsers(List.of(Map.of("type", "pdf", "maxPages", 5)))
    .actions(actions)
    .location(location)
    .removeBase64Images(true)
    .blockAds(true)
    .proxy("auto")
    .maxAge(86400000L)
    .storeInCache(true)
    .build();

Document doc = client.scrape("https://example.com/pricing", options);

Parameters

  • url
    • Type: String
    • Use when: you want to scrape a specific page.
  • options.formats
    • Type: List of format strings or format maps
    • Use when: you want multiple output formats.
    • Confirmed format strings:
      • "markdown": markdown content
      • "html": cleaned HTML
      • "rawHtml": raw HTML
      • "links": page links
      • "images": image URLs
      • "screenshot": screenshot output
      • "summary": summary output
      • "changeTracking": change tracking output
      • "json": JSON extraction
      • "attributes": attribute extraction
      • "branding": branding profile output
      • "audio": audio extraction
    • Format object fields:
      • type: one of the format strings above
      • prompt, schema: JSON extraction options for type: "json"
      • modes, schema, prompt, tag: change tracking options for type: "changeTracking"
      • fullPage, quality, viewport: screenshot options for type: "screenshot"
      • selectors: array of {selector, attribute} for type: "attributes"
  • options.headers
    • Type: Map<String, String>
    • Use when: you need custom request headers.
  • options.includeTags
    • Type: List<String>
    • Use when: you want to include only specific HTML tags.
  • options.excludeTags
    • Type: List<String>
    • Use when: you want to exclude specific HTML tags.
  • options.onlyMainContent
    • Type: Boolean
    • Use when: you want to strip nav, footer, and other boilerplate.
  • options.timeout
    • Type: Integer
    • Use when: you need a timeout in milliseconds.
  • options.waitFor
    • Type: Integer
    • Use when: you need to wait for the page to render (milliseconds).
  • options.mobile
    • Type: Boolean
    • Use when: you want a mobile viewport.
  • options.parsers
    • Type: List<Object>
    • Use when: you need file parsing controls.
    • Confirmed values:
      • "pdf"
      • {type: "pdf", maxPages: number}
  • options.actions
    • Type: List<Map<String, Object>>
    • Use when: you need lightweight pre-scrape actions.
    • Confirmed action types:
      • wait: milliseconds or selector required
      • screenshot: fullPage, quality, viewport optional
      • click: selector required
      • write: text required (click to focus the input first)
      • press: key required
      • scroll: direction (up or down) required, selector optional
      • scrape: no additional fields
      • executeJavascript: script required
      • pdf: format (A0, A1, A2, A3, A4, A5, A6, Letter, Legal, Tabloid, Ledger), landscape, scale optional
  • options.location
    • Type: LocationConfig
    • Use when: you need geo or language-aware scraping.
  • options.skipTlsVerification
    • Type: Boolean
    • Use when: you need to skip TLS verification.
  • options.removeBase64Images
    • Type: Boolean
    • Use when: you want to drop base64 images from markdown output.
  • options.blockAds
    • Type: Boolean
    • Use when: you want ad and cookie popup blocking.
  • options.proxy
    • Type: String
    • Use when: you need proxy control.
    • Confirmed values: "basic", "stealth", "enhanced", "auto", or a custom proxy URL string
  • options.maxAge
    • Type: Long
    • Use when: you want cached data up to a maximum age (milliseconds).
  • options.storeInCache
    • Type: Boolean
    • Use when: you want Firecrawl to cache the result.
  • options.integration
    • Type: String
    • Use when: the API expects an integration identifier on the request.

Interact

Why use it

Use interact when a page requires browser actions or code execution after a scrape starts.

Preferred SDK methods

  • client.interact(jobId, code) — uses default language node and API default execution timeout
  • client.interact(jobId, code, language, timeout)timeout is seconds (1–300), or null to omit and use the API default (30 seconds)
  • client.interact(jobId, code, language, timeout, origin) — optional origin string is sent only when non-null (request attribution)

Simple Example

import com.firecrawl.models.BrowserExecuteResponse;

BrowserExecuteResponse result = client.interact(
    "<scrapeJobId>",
    "console.log(await page.title());",
    "node",
    60
);

Complex Example

BrowserExecuteResponse result = client.interact(
    "<scrapeJobId>",
    "// Use Playwright page methods here",
    "node",
    120
);

Stop interactive browser

End the scrape-bound browser session when finished. Preferred SDK method: client.stopInteractiveBrowser(jobId)
import com.firecrawl.models.BrowserDeleteResponse;

BrowserDeleteResponse stopped = client.stopInteractiveBrowser("<scrapeJobId>");

Interact response (BrowserExecuteResponse)

  • isSuccess(): boolean
  • getStdout(), getStderr(), getResult(), getError(): String (may be null)
  • getExitCode(): Integer (may be null)
  • getKilled(): Boolean (may be null) — true when execution was stopped due to timeout

Stop response (BrowserDeleteResponse)

  • isSuccess(): boolean
  • getSessionDurationMs(): Long (may be null)
  • getCreditsBilled(): Integer (may be null)
  • getError(): String (may be null)

Parameters

  • jobId
    • Type: String
    • Use when: you have a scrape job ID.
  • code
    • Type: String
    • Use when: you want to run code in the browser session.
  • language
    • Type: String
    • Use when: you need a specific runtime.
    • Confirmed values: "python", "node", "bash"
    • Notes: defaults to "node" in the SDK if null.
  • timeout
    • Type: Integer
    • Use when: you need an execution timeout in seconds (1–300). Null omits the field and uses the API default.
  • origin
    • Type: String
    • Use when: you need an optional origin label on the request. Prefer omitting unless your integration requires it.

Notes

  • Deprecated aliases: scrapeExecuteinteract, deleteScrapeBrowserstopInteractiveBrowser (and the corresponding *Async helpers).
  • The Java SDK exposes code-based interactions only: there is no prompt parameter on interact (unlike some other language SDKs).

Source Of Truth

  • firecrawl/apps/java-sdk/build.gradle.kts
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/client/FirecrawlClient.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/SearchOptions.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/ScrapeOptions.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/SearchData.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/JsonFormat.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/LocationConfig.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/Document.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/BrowserExecuteResponse.java
  • firecrawl/apps/java-sdk/src/main/java/com/firecrawl/models/BrowserDeleteResponse.java
  • firecrawl-docs/api-reference/v2-openapi.json