Apify and Crawlee Official Forum

How to close the crawler from a RequestHandler?

Hey folks, I want to stop the scraper/crawler if I hit some arbritrary condition. Is there a way that I can do so from inside the RequestHandler? the closest function that I found is crawler.teardown() but it cant be executed inside a handler,

12 comments

AAlexey Udovydchenko

Instead of await crawler.run() just crawler.run() and then teardown when you condition or event will be handled by your own code outside of crawler

AAltairSama2

issue is the conditions are triggered in specific routes of a site

AAltairSama2

for eg. we have a resume function in our selenium scrapers which checks for duplicates and if some n number appear in a row we stop scraping assuming the rest of the data will have been scraped already too

AAltairSama2

plus we have a couple of other such conditions, it would be helpful if something like this was present inside the request handlers

AAltairSama2

similar question, how do I stop the request handler flow if some condition is satisfied? e.g. if some element is not present, stop the function right there. Will a simple return; suffice? since we anwyays dont return anything and just enqueue links.

AAltairSama2

can someone help wth this? could really use this functionality to avoid duplicate/redundant scrapes etc

AAltairSama2

or is there a way we can empty out request queue? I think this might work since the crawler will stop as soon as it sees there's nothing to scrape

AApifyBot

just advanced to level 4! Thanks for your contributions! 🎉

LLukas Krivka

For reference answered here https://discord.com/channels/801163717915574323/1075487274424352888/1179490876850974810

LLukas Krivka

If you want to stop the request handler itself, you need to have a condition at that point, JS doesn't allow cancelling functions/promises from the outside.

AAltairSama2

yeah, that should not be an issue since we can control the number of requests via maxConcurrency

AAltairSama2

thanks!

Add a reply

Join on Discord