> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firecrawl.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# 抓取 GitHub

> 了解如何使用 Firecrawl 的核心功能抓取 GitHub

了解如何使用 Firecrawl 的核心功能抓取 GitHub 的仓库、议题（issues）和文档。

<div id="setup">
  ## 设置
</div>

```bash theme={null}
npm install firecrawl zod
```

<div id="scrape-with-json-mode">
  ## 使用 JSON 模式抓取
</div>

通过 Zod schema 从仓库中提取结构化数据。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';
import { z } from 'zod';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
    formats: [{
        type: 'json',
        schema: z.object({
            name: z.string(),
            description: z.string(),
            stars: z.number(),
            forks: z.number(),
            language: z.string(),
            topics: z.array(z.string())
        })
    }]
});

console.log(result.json);
```

<div id="search">
  ## 搜索
</div>

搜索 GitHub 上的仓库、issue 或文档。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const searchResult = await firecrawl.search('machine learning site:github.com', {
    limit: 10,
    sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' }
    scrapeOptions: {
        formats: ['markdown']
    }
});

console.log(searchResult);
```

<div id="scrape">
  ## 抓取
</div>

抓取单个 GitHub 页面 (仓库、issue 或文件) 。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
    formats: ['markdown'] // 例如 html、links 等
});

console.log(result);
```

<div id="map">
  ## 映射
</div>

发现仓库或文档站点中所有可用的 URL。请注意：Map 仅返回 URL，不包含页面内容。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const mapResult = await firecrawl.map('https://github.com/vercel/next.js/tree/canary/docs');

console.log(mapResult.links);
// 返回不含内容的 URL 数组
```

<div id="crawl">
  ## Crawl
</div>

从仓库或文档中抓取多个页面。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const crawlResult = await firecrawl.crawl('https://github.com/facebook/react/wiki', {
    limit: 10,
    scrapeOptions: {
        formats: ['markdown']
    }
});

console.log(crawlResult.data);
```

<div id="batch-scrape">
  ## 批量抓取
</div>

同时抓取多个 GitHub 链接。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

// 等待完成
const job = await firecrawl.batchScrape([
    'https://github.com/vercel/next.js',
    'https://github.com/facebook/react',
    'https://github.com/microsoft/typescript'],
    {
        options: {
            formats: ['markdown']
        },
        pollInterval: 2,
        timeout: 120
    }
);


console.log(job.status, job.completed, job.total);

console.log(job);
```

<div id="batch-scrape-with-json-mode">
  ## 使用 JSON 模式批量抓取
</div>

一次性从多个仓库中抽取结构化数据。

```typescript theme={null}
import { Firecrawl } from 'firecrawl';
import { z } from 'zod';

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

// 等待完成
const job = await firecrawl.batchScrape([
    'https://github.com/vercel/next.js',
    'https://github.com/facebook/react'],
    {
        options: {
            formats: [{
                type: 'json',
                schema: z.object({
                    name: z.string(),
                    description: z.string(),
                    stars: z.number(),
                    language: z.string()
                })
            }]
        },
        pollInterval: 2,
        timeout: 120
    }
);


console.log(job.status, job.completed, job.total);

console.log(job);
```
