Apify Discord Mirror

Updated 2 years ago

Resume crawler based on request queues from previous run locally and in apify

At a glance

The community member is asking if it is possible to stop a crawler and resume it from the previous run's request queues, in order to add proxies and speed up processing without starting from scratch. The comments suggest that using a named request queue instead of an unnamed one can help, as the named queue is persisted. Another community member also mentions that the process can be canceled and restarted without purging the storage, using the crawlee run --no-purge command. The community members also note that Apify is working on a graceful abort feature to address this issue.

Useful resources
Is it possible to stop a crawler and resume it from the previous run's request queues?

I have a crawler that has run for a couple hours locally and I would like to add proxies to it to speed up processing speed because I am getting throttled by using 1 IP, but without starting from scratch because it will be unnecessary and a waste of time. I want to use my existing request queues. Is this possible?

Also is this possible on Apify?
t
C
L
4 comments
Use a named request queue instead of an unnamed one. It is persisted. The default request queue is unnamed and is tied to the actor's run by default
You can also do it by canceling the process and then starting but without the storage purge crawlee run --no-purge.

We are also figuring out graceful abort - https://github.com/apify/crawlee/issues/1531
Add a reply
Sign up and join the conversation on Discord