How to retry only failed requests after the crawler has...

At a glance

The community member finished a crawler that processed around 1.7 million requests, but had around 100,000 failed requests. They asked if there is a way to retry just the failed requests. The comments suggest that this is not currently supported, but one community member recommends creating a dataset or key-value store to handle the failed requests. Another community member notes that retrying the failed requests would require writing a new scraper, as the crawler consists of multiple route handlers. A third community member suggests using the failedRequestHandler option in the Crawlee library to handle all failed requests in one place.

Useful resources

VVi

I finished the crawler with around 1.7M, and got around 100k failed requests. Is there a way to retry just the failed requests ?

4 comments

RRado Ch.

hey, that's not currently supported, I would recommend creating a dataset/kv store for failed requests and push to it from failed request handler

VVi

Retrying it would necessitate writing a new scraper since the crawler consists of multiple routes handlers....

I thought I could just run the scraper again but increasing the retry count.

AApifyBot

@Vi just advanced to level 3! Thanks for your contributions! 🎉

RRado Ch.

you can use failed request handler to handle all failed request in one place https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#failedRequestHandler

Add a reply

Apify Discord Mirror

How to retry only failed requests after the crawler has finished ?