First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher.
Then also use the โmax-old-space-size attribute to run your spider
Thanks very much. I will try it.
did you fix this? wondering why even 1 browser is spending 12GB?!
just advanced to level 6! Thanks for your contributions! ๐
It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages
thank you for your response. struggling over here scaling my crawlers.
Doesn't crawlee automatically close pages?
Nope .
await page.close() is your weapon
omg, after default handler it doesn't close page?
await page.close() at the end of your handler
Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .
is there anyway to ensure that gets cleaned up?
Don't know what package you are using for that. Check docs of that particular package.
Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different .
Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .
Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.
There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue
would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?
what time is at your place ?
We can do tomorrow or another time.
Here is 4am. If we can do it in 5-6 hours it would be great for me .
Kids are sleeping and it's hard to have a call at this hour
haha understandable! Up at 4am ๐
i'll add you and we can figured it out. thank you.