Apify Discord Mirror

Updated 5 months ago

Why "Actor finished successfully" when it actually ERRORed?

At a glance
The post describes an issue where a crawler completed successfully, but the log shows an error. The community members discuss that the run status is not based on the number of successful/failed requests, and that the failed status is set if the script throws an exception or sets a specific exit code. The community members suggest checking for any output items after the crawler.run() method, and if there are none, using methods to explicitly set the run status to failed.
Useful resources
Why does Apify think this completed successfully when the log shows that it had an error?


Plain Text
<snip>
2024-01-14T15:18:00.631Z ERROR PuppeteerCrawler: Request failed and reached maximum retries. TimeoutError: Waiting for selector `#video-details` failed: Waiting failed: 30000ms exceeded
2024-01-14T15:18:00.634Z     at Timeout.<anonymous> (/home/myuser/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:64:32)
2024-01-14T15:18:00.636Z     at listOnTimeout (node:internal/timers:559:17)
2024-01-14T15:18:00.639Z     at processTimers (node:internal/timers:502:7) {"id":"mQnbcE2JCeZcP3c","url":"https://studio.brightcove.com/products/videocloud/media/videos/6344609341112","method":"GET","uniqueKey":"https://studio.brightcove.com/products/videocloud/media/videos/6344609341112"}
2024-01-14T15:18:00.641Z Request https://studio.brightcove.com/products/videocloud/media/videos/6344609341112 failed too many times.
2024-01-14T15:18:00.752Z INFO  PuppeteerCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2024-01-14T15:18:01.199Z INFO  PuppeteerCrawler: Crawl finished. Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[null,null,null,1],"requestAvgFailedDurationMillis":43283,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":43283,"requestsTotal":1,"crawlerRuntimeMillis":193388}
2024-01-14T15:18:01.202Z INFO  PuppeteerCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Waiting for selector `#video-details` failed: Waiting failed: 30000ms exceeded (/home/myuser/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:64:32)"]}
2024-01-14T15:18:01.203Z Crawler finished.
2024-01-14T15:18:01.472Z INFO  Actor finished successfully (exit code 0)
v
r
4 comments
Hello ,

the run status is by default not based on the number of successful/failed requests. The failed status is set, if the Actor throws an exception, by using the Actor.fail method or by setting specific exitCode in the method Actor.exit.

Are you using your own Actor, or using some public one?
Need to establish some kind of fail state when checking a run status via API (/actor-runs/) ... if it continued retrying and then finished because the selector was not on the page, I need to pick that up as a fail flag somehow, despite it otherwise being handled adequately by the script.
I would suggest checking whether there are any output items after the crawler.run() method. In case there are not any, you can use any of the methods described above to change the run status to failed.
Add a reply
Sign up and join the conversation on Discord