Apify Discord Mirror

Updated 3 weeks ago

Shared external queue between multiple crawlers

At a glance

The community member is asking if there is a way to force Cheerio/Playwright crawlers to stop using their own internal request queue and instead "enqueue links" to another queue service such as Redis. The goal is to be able to run multiple crawlers on a single website and share the same queue to avoid duplicate links.

Another community member responds that the request queue is managed by Crawlee, not Cheerio or Playwright directly. They suggest creating a custom RequestQueue that inherits Crawlee's class, and then passing the custom queue to the (Cheerio/Playwright) Crawler.

Useful resources
Hello folks!
Is there any way i can force cheerio/playwright crawlers to stop using their own internal request queue and instead "enqueue links" to another queue service such as Redis? I would like to achieve this in order to be able to run multiple crawlers on a single website and i would need them to share the same queue so they won't use duplicate links.
Thanks in advance!
Marked as solution
Hello!
The request queue is managed by Crawlee, and not by Cheerio or Playwright directly. What you could try to do, is creating a custom RequestQueue which inherits Crawlee's class: https://crawlee.dev/api/core/class/RequestQueue. Here is the source code: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue.ts#L77.
Then, you could pass the custom queue to the (Cheerio/Playwright) Crawler: https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler#requestQueue.
View full solution
A
M
2 comments
@mesca4046 just advanced to level 1! Thanks for your contributions! πŸŽ‰
Hello!
The request queue is managed by Crawlee, and not by Cheerio or Playwright directly. What you could try to do, is creating a custom RequestQueue which inherits Crawlee's class: https://crawlee.dev/api/core/class/RequestQueue. Here is the source code: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue.ts#L77.
Then, you could pass the custom queue to the (Cheerio/Playwright) Crawler: https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler#requestQueue.
Add a reply
Sign up and join the conversation on Discord