Crawler stopped abruptly and exited with a success mess...

I'm running a crawler on Apify. Looking at the logs, the 2 messages stand out:

Plain Text

...
2024-08-06T22:02:30.894Z ERROR HttpCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler.

...

2024-08-06T22:02:34.090Z ERROR An error occurred during crawling:

FYI:
CPU
Average:8.95%
Maximum:29.11%

Memory
Average:179.2 MB
Maximum:335.1 MB

Request Queue tab:
NAME: Unnamed
TOTAL: 921
PENDING: 101
HANDLED: 820
READS: 995
WRITES: 3894
DELETES: 0
HEAD ITEM READS: 100
STORAGE SIZE: 196.7 kB

Any clues on how I can even begin to troubleshoot this?

6 comments

oondro_k

hey, could you share the run's ID?

rrico

Thanks for replying, apologies for the delay. id: p1uNn3ltHeW2zii8o

oondro_k

Thanks for the ID. The crawler shouldn’t have crashed like that, and we’ll look into why it happened. You can resurrect the run to fix the problem and let it continue, but you’re getting blocked so much that I wouldn’t recommend it. You’re using an HttpCrawler, but it’s running into Cloudflare Turnstile, which blocks it. To be able to bypass that, you would need to use a browser with fingerprints.

rrico

Thanks for looking into it. So to bypass it, using the CheerioCrawlerwould be sufficient?

ppollon

Yeah, I think so

rrico

I've got a bunch of 403 but that might not per se means that Cloudflare Turnstile blocked it. How can I distinguish if it's a Cloudflare 403 or not?

Add a reply

Apify and Crawlee Official Forum

Crawler stopped abruptly and exited with a success message