Apify Discord Mirror

Updated 5 months ago

Stop scraping in the middle of a route handler if some condition is met?

At a glance
The community member is facing an issue where their Node.js process is hanging indefinitely after scraping is done, even when they are using return to exit the route handler when a certain condition is met. They had previously used crawler.teardown() to address this issue, but they are now facing the same problem in handlers where they are not calling crawler.teardown(). The community members suggest that the hanging issue is not related to the use of return, and that using if statements with return is the way to go. They also suggest that the community member should try to debug the issue and provide potential fixes, as the problem seems to be related to Mongoose rather than the crawler library.
Useful resources
hey folks, basically the title, I want to stop scraping in the middle of a route handler if some conditon is met because the full function is a little expensive computationally.

I am using return; to exit the route handler when some condition is met but I am facing issues with the the node process hanging indefinitely after scraping is done. I had a previous thread using crawler.teardown() and how removing the return statement stopped the hanging issue.

But even in handlers where I am not calling crawler.teardow(), I am facing hanging issues.

is there a better way to accomplish this?
L
A
10 comments
Hanging process has nothing to do with return. Yes, having if statements with return is the way to go
can you some ideaa about how I can go about debugging this? or any potential fixes I can possibly implement. would appreciate any help you can give
Does it hang after the crawler finishes? Why would you use teardown and not let it process the requests till the end?
no, the teardown usecase was different, I had to open a couple of pages in the route itself to grab info and wanted to skip it to avoid spamming the site if possible and also to stop the crawler asap but our main use case was to have some kind of "resume" for it.
in the current use case I'm not calling teardown anywhere
it's just a simple route and the crawler hangs after it logs the final request statistics
and this is the route handler(https://pastebin.com/8JaY9PUR), the crawler's startup url only hits this route and is supposed to shut down but it hangs indefinitely
and its stuck here
Attachment
image.png
nvm its not crawlee
its mongoose screwing around
Add a reply
Sign up and join the conversation on Discord