Resume crawler based on request queues from previous ru...

At a glance

The community member is asking if it is possible to stop a crawler and resume it from the previous run's request queues, in order to add proxies and speed up processing without starting from scratch. The comments suggest that using a named request queue instead of an unnamed one can help, as the named queue is persisted. Another community member also mentions that the process can be canceled and restarted without purging the storage, using the crawlee run --no-purge command. The community members also note that Apify is working on a graceful abort feature to address this issue.

Useful resources

CCasper

Is it possible to stop a crawler and resume it from the previous run's request queues?

I have a crawler that has run for a couple hours locally and I would like to add proxies to it to speed up processing speed because I am getting throttled by using 1 IP, but without starting from scratch because it will be unnecessary and a waste of time. I want to use my existing request queues. Is this possible?

Also is this possible on Apify?

4 comments

tthek1tten

Use a named request queue instead of an unnamed one. It is persisted. The default request queue is unnamed and is tied to the actor's run by default

CCasper

Thanks

LLukas Krivka

You can also do it by canceling the process and then starting but without the storage purge crawlee run --no-purge.

We are also figuring out graceful abort - https://github.com/apify/crawlee/issues/1531

CCasper

Thanks

Add a reply

Apify Discord Mirror

Resume crawler based on request queues from previous run locally and in apify