Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Crawler skipping Jobs after processing 5,000-6,000 Requests

Since a few days, I have been running the crawler with a high number of jobs. As a result, I have run into a problem.

I have found, that not all jobs are processed by the CheerioCrawler despite these jobs being added to the queue through addRequest([job]).

I can't really reproduce it, it happens approximately after 5000 - 6000 number of jobs.

My code doesn't crash, it continues to the next jobs (BullMQ job queue) without scraping the link.

This is normal behavior, since it reaches the requestHandler (CheerioInfo logger)
Attachment
image.png
L
B
A
7 comments
Here is where it starts misbehaving, and I have no idea why because the job/urls are valid. Seems like it doesn't reach crawler anymore
Attachment
image.png
And at this point, my Kafka consumer doesn't receive new data (product) from the scraper
This issue is still there, does anyone know how to solve this?
did you use loop to run your crawler continuously? as long as there is a urls? how do you do that?
just advanced to level 2! Thanks for your contributions! πŸŽ‰
I’m using BullMQ and my CheerioCrawler has keepAlive set on true.

I use Cron to dispatch jobs to the crawler. The worker in this case stops working after the next batch of jobs.
just advanced to level 2! Thanks for your contributions! πŸŽ‰
Add a reply
Sign up and join the conversation on Discord
Join