Prevent automatic reclaim of failed requests

LLed

Hi everyone! Hope you're all doing well. I have a small question about Crawlee.

My use case is a little simpler than a crawler; I just want to scrape a single URL every few seconds.

To do this, I create a RequestList with just one url and start the Crawler. Sometimes, the crawler returns HTTP errors and fails. However, I don't mind as I'm going to run the crawler again after a few seconds and I'd prefer the errors to be ignored rather than automatically reclaimed.

Is there a way of doing this?

3 comments

nnathanist

You can simply set the maxRequestRetries option to 0:

Plain Text

const crawler = new BasicCrawler({
    maxRequestRetries: 0,
    ...
});

LLed

Maybe I misunderstood how the lib works, but wouldn't that just make the request go to failed status faster?

Correct me if I'm wrong, but what I understood is:

Url is added to requests;
If the request fails, it is retried up to maxRequestRetries times;
If it still fails, it is marked as failed and can be reclaimed.

OOleg V.

I guess, You can use noRetry option:
https://crawlee.dev/api/next/core/class/Request#noRetry

Add a reply

Join on Discord

Apify and Crawlee Official Forum

Prevent automatic reclaim of failed requests