Apify Discord Mirror

Updated 5 months ago

PlaywrightCrawler actor not finishing requestQueue

At a glance

The community member has a Playwright Actor that is not processing all 10 URLs in its queue. The Actor processes between 4 and 7 URLs, then the log shows repeated statistics messages. This issue occurs both locally and on the Apify platform. The community member has shared a run URL, but no one has been able to determine the cause of the issue. One community member had a similar problem with a Playwright crawler in Crawlee, but the issues seem unrelated. Another community member tried starting their project from scratch using the base Playwright Actor provided by Apify, which resolved the problem for them, but the original poster has not found a solution.

Useful resources
I have a playwright Actor that will has 10 URLs added to its queue before i kick it off with .run(). But the actor doesn't finish all 10 URLs. It will process between 4 and 7, then the Log for the run will just show statistics message repeated every second.

Note that this happens in my local runs of this Actor as well. The total number of URLs scraped (out of 10) varies from run to run, minimum 1 URL and max 7 (of 10 total).

This is the message it shows on repeat, on my local and on Apify platform:
Plain Text
2024-05-22T22:34:24.274Z INFO  Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":35781,"requestsFinishedPerMinute":2,"requestsFailedPerMinute":0,"requestTotalDurationMillis":143124,"requestsTotal":4,"crawlerRuntimeMillis":120866,"retryHistogram":[4]}
2024-05-22T22:34:24.301Z INFO  PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":6,"desiredConcurrency":11,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}


Why would it stop pulling from the requestsQueue? There are no errors in the Actor prior to this.
3
A
o
k
8 comments
just advanced to level 1! Thanks for your contributions! 🎉
Hi, could you share ID (or URL) of your run?
Hi, did you find out what was wrong? I am having the same issue while using a playwright browser in crawlee.
I have had similar problem with playwright crawler that it finishes and there were still pending requests in the queue.
But now this happened. Pending requests = -5 , how can this happen?
Attachment
image.png
I never figured out why this was happening. I wound up starting my project from scratch again from the base playwright actor provided by apify and I haven’t had this problem again.
These two issues seem unrelated. In the run from kenny, all requests get fetched from the queue, but then the Actor stalls while handling them.

To check your issue , we would need more info.
I have shared you the run via private message. Thanks.
Add a reply
Sign up and join the conversation on Discord