Stop scraping in the middle of a route handler if some ...

At a glance

The community member is facing an issue where their Node.js process is hanging indefinitely after scraping is done, even when they are using return to exit the route handler when a certain condition is met. They had previously used crawler.teardown() to address this issue, but they are now facing the same problem in handlers where they are not calling crawler.teardown(). The community members suggest that the hanging issue is not related to the use of return, and that using if statements with return is the way to go. They also suggest that the community member should try to debug the issue and provide potential fixes, as the problem seems to be related to Mongoose rather than the crawler library.

Useful resources

AAltairSama2

hey folks, basically the title, I want to stop scraping in the middle of a route handler if some conditon is met because the full function is a little expensive computationally.

I am using return; to exit the route handler when some condition is met but I am facing issues with the the node process hanging indefinitely after scraping is done. I had a previous thread using crawler.teardown() and how removing the return statement stopped the hanging issue.

But even in handlers where I am not calling crawler.teardow(), I am facing hanging issues.

is there a better way to accomplish this?

10 comments

LLukas Krivka

Hanging process has nothing to do with return. Yes, having if statements with return is the way to go

AAltairSama2

can you some ideaa about how I can go about debugging this? or any potential fixes I can possibly implement. would appreciate any help you can give

LLukas Krivka

Does it hang after the crawler finishes? Why would you use teardown and not let it process the requests till the end?

AAltairSama2

no, the teardown usecase was different, I had to open a couple of pages in the route itself to grab info and wanted to skip it to avoid spamming the site if possible and also to stop the crawler asap but our main use case was to have some kind of "resume" for it.

AAltairSama2

in the current use case I'm not calling teardown anywhere

AAltairSama2

it's just a simple route and the crawler hangs after it logs the final request statistics

AAltairSama2

and this is the route handler(https://pastebin.com/8JaY9PUR), the crawler's startup url only hits this route and is supposed to shut down but it hangs indefinitely

AAltairSama2

and its stuck here

Attachment

AAltairSama2

nvm its not crawlee

AAltairSama2

its mongoose screwing around

Add a reply

Apify Discord Mirror

Stop scraping in the middle of a route handler if some condition is met?