Apify and Crawlee Official Forum

Updated 3 months ago

How to close the crawler from a RequestHandler?

Hey folks, I want to stop the scraper/crawler if I hit some arbritrary condition. Is there a way that I can do so from inside the RequestHandler? the closest function that I found is crawler.teardown() but it cant be executed inside a handler,
1
A
A
A
12 comments
Instead of await crawler.run() just crawler.run() and then teardown when you condition or event will be handled by your own code outside of crawler
issue is the conditions are triggered in specific routes of a site
for eg. we have a resume function in our selenium scrapers which checks for duplicates and if some n number appear in a row we stop scraping assuming the rest of the data will have been scraped already too
plus we have a couple of other such conditions, it would be helpful if something like this was present inside the request handlers
similar question, how do I stop the request handler flow if some condition is satisfied? e.g. if some element is not present, stop the function right there. Will a simple return; suffice? since we anwyays dont return anything and just enqueue links.
can someone help wth this? could really use this functionality to avoid duplicate/redundant scrapes etc
or is there a way we can empty out request queue? I think this might work since the crawler will stop as soon as it sees there's nothing to scrape
just advanced to level 4! Thanks for your contributions! πŸŽ‰
If you want to stop the request handler itself, you need to have a condition at that point, JS doesn't allow cancelling functions/promises from the outside.
yeah, that should not be an issue since we can control the number of requests via maxConcurrency
Add a reply
Sign up and join the conversation on Discord