Apify and Crawlee Official Forum

Updated 3 months ago

Best practice to stop/crash the actor/crawler on high ratio of errors?

Following snippet works well for me, but it smells... sb have a cleaner approach?

Plain Text
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
  const { requestsFinished, requestsFailed } = crawler.stats.state
  if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
    console.warn(`πŸ’£ Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
    process.exit(1)
  }
}, 3000)
H
s
A
3 comments
There is now some message on apify which comes I guess from the crawler when there are problems. So maybe you can use that if you find out what is generating that message.
Attachment
image.png
This guy knows stuff πŸ™
you can use stats https://crawlee.dev/api/browser-crawler/class/BrowserCrawler#stats however approach itself is not safe - you supposed to handle sessions and-or bot protection to resolve blocking by logic, not by hammering web site doing many runs. I.e. set concurrency, max request retries, logic for session.markBad etc and implement scalable crawler.
Add a reply
Sign up and join the conversation on Discord