Managing Queue using redis or something similar and having worker nodes listening on queue

Question

I'm trying to run Crawlee for production use and try to scale where we can have a cluster of worker nodes who will be ready for crawling pages based on the request. How can achieve this.

The RequestQueue is basically writing requests to files and not utilizing any queueing system. I couldn't find doc that said how i can utilise Redis queue or something similar.

Marco · Answer

I'm not aware of such a possibility. Actually, I don't think that Crawlee's queues were intended for concurrent access, but for keeping track of todo/done jobs within a single or multiple, but subsequent, executions. You should develop your own solution to manage and scale workers, or look at existing solutions, such as Apify.

darkprince · Answer

If i create a custom RequestQueue which uses redis, then this should be possible right?

darkprince · Answer

Or is it possible that I can use Apify managed queue and still run the crawler in my infra instead of managed actors?

darkprince · Answer

@Marco

Marco · Answer

To the latter question, I'd say no: Apify does not provide on premise solutions.
Regarding implementing a RequestQueue with uses Redis, I think it would be possible! You can take a look at the code here: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue_v2.ts#L55

darkprince · Answer

Okay. I will check it out. I guess extending the RequestQueue with redis would do the trick for me.

ApifyBot · Answer

@darkprince just advanced to level 1! Thanks for your contributions! 🎉

Apify and Crawlee Official Forum

Managing Queue using redis or something similar and having worker nodes listening on queue