Apify and Crawlee Official Forum

Updated 2 weeks ago

Managing Queue using redis or something similar and having worker nodes listening on queue

I'm trying to run Crawlee for production use and try to scale where we can have a cluster of worker nodes who will be ready for crawling pages based on the request. How can achieve this.

The RequestQueue is basically writing requests to files and not utilizing any queueing system. I couldn't find doc that said how i can utilise Redis queue or something similar.
M
d
A
7 comments
I'm not aware of such a possibility. Actually, I don't think that Crawlee's queues were intended for concurrent access, but for keeping track of todo/done jobs within a single or multiple, but subsequent, executions. You should develop your own solution to manage and scale workers, or look at existing solutions, such as Apify.
If i create a custom RequestQueue which uses redis, then this should be possible right?
Or is it possible that I can use Apify managed queue and still run the crawler in my infra instead of managed actors?
To the latter question, I'd say no: Apify does not provide on premise solutions.
Regarding implementing a RequestQueue with uses Redis, I think it would be possible! You can take a look at the code here: https://github.com/apify/crawlee/blob/master/packages/core/src/storages/request_queue_v2.ts#L55
Okay. I will check it out. I guess extending the RequestQueue with redis would do the trick for me.
@darkprince just advanced to level 1! Thanks for your contributions! πŸŽ‰
Add a reply
Sign up and join the conversation on Discord