SourceSync.ai is a Retrieval Augmented Generation as a Service platform that helps you build AI applications with your own data. This guide explains how to use Firecrawl with SourceSync.ai for web scraping capabilities.
Setup
First, obtain your Firecrawl API key from your Firecrawl dashboard
Configure your SourceSync.ai namespace to use Firecrawl as the web scraping provider:
curl -X PATCH https://api.sourcesync.ai/v1/namespaces/YOUR_NAMESPACE_ID \
-H "Authorization: Bearer YOUR_SOURCE_SYNC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"webScraperConfig" : {
"provider" : "FIRECRAWL" ,
"apiKey" : "YOUR_FIRECRAWL_API_KEY"
}
} '
Usage
Once configured, you can use SourceSync.ai’s web scraping endpoints with Firecrawl’s capabilities. Here are the main ingestion methods:
URL List Ingestion
Scrape specific URLs:
curl -X POST https://api.sourcesync.ai/v1/ingest/urls \
-H "Authorization: Bearer YOUR_SOURCE_SYNC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId" : "YOUR_NAMESPACE_ID" ,
"ingestConfig" : {
"source" : "URLS_LIST" ,
"config" : {
"urls" : [
"https://example.com/page1" ,
"https://example.com/page2"
] ,
"scrapeOptions" : {
"includeSelectors" : [ "article" , "main" ] ,
"excludeSelectors" : [ ".navigation" , ".footer" ]
}
}
}
} '
Website Crawling
Crawl an entire website with custom rules:
curl -X POST https://api.sourcesync.ai/v1/ingest/website \
-H "Authorization: Bearer YOUR_SOURCE_SYNC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId" : "YOUR_NAMESPACE_ID" ,
"ingestConfig" : {
"source" : "WEBSITE" ,
"config" : {
"url" : "https://example.com" ,
"maxDepth" : 3 ,
"maxLinks" : 100 ,
"includePaths" : [ "/docs" , "/blog" ] ,
"excludePaths" : [ "/admin" ] ,
"scrapeOptions" : {
"includeSelectors" : [ "article" , "main" ] ,
"excludeSelectors" : [ ".navigation" , ".footer" ]
}
}
}
} '
Sitemap Processing
Process all URLs from a sitemap:
curl -X POST https://api.sourcesync.ai/v1/ingest/sitemap \
-H "Authorization: Bearer YOUR_SOURCE_SYNC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId" : "YOUR_NAMESPACE_ID" ,
"ingestConfig" : {
"source" : "SITEMAP" ,
"config" : {
"url" : "https://example.com/sitemap.xml" ,
"scrapeOptions" : {
"includeSelectors" : [ "article" , "main" ] ,
"excludeSelectors" : [ ".navigation" , ".footer" ]
}
}
}
} '
Features
When using Firecrawl with SourceSync.ai, you get access to:
JavaScript rendering support
Automatic rate limiting
CSS selector-based content extraction
Recursive crawling with depth control
Sitemap processing
Resources
For additional support: