Queues Web Crawler Example

An example use-case for Queues: a web crawler built on Browser Rendering and Puppeteer. The crawler finds the number of links to Cloudflare.com on the site, and archives a screenshot to Workers KV.

For this project, Queues helps batch sites to be crawled, which limits the overhead of opening and closing new Puppeteer instances. Because loading pages and scraping links takes some time, Queues makes it possible to respond to inbound crawl requests instantly while providing peace of mind that the long-running crawl will be triggered. Queues also helps handle bursty traffic and reliability issues!

Development

This assumes you have access to the Browser Rendering feature - you can join the waitlist here.

First, fork this project. Install Node.js and Wrangler, and run npm install.

Then, to configure your project and deploy on Cloudflare Workers:

Go to the Dash and click on Workers & Pages > Queues > Create queue. Enter a Queue name.
In the pages directory, wrangler pages deploy ., and enter a project name (PROJECT_NAME).
Go to the Dash and click on Workers & Pages > Overview > PROJECT_NAME > Settings > Functions > Queue Producers bindings > Add binding.
Set the variable name to CRAWLER_QUEUE and select your queue as the Queue name. Click "Save".
In the Dash, click on Workers & Pages > KV > Create a namespace. Create one namespace called crawler_screenshots and one called crawler_links.
Create two KV namespace bindings. Set CRAWLER_LINKS_KV as first's variable name and crawler_links as the KV namespace. Then, set CRAWLER_SCREENSHOTS_KV as the second's variable name and crawler_screenshots as the KV namespace.
In the consumer directory, update the wrangler.toml file with your new KV namespace IDs. Also update the [[queues.consumers]] name to the Queue you created.
In the consumer directory, wrangler deploy.

Your Queues-powered web crawler will be live!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
consumer		consumer
img		img
pages		pages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Queues Web Crawler Example

Development

About

Releases

Packages

Languages

License

cloudflare/queues-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Queues Web Crawler Example

Development

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages