Apify

Apify and Crawlee Official Forum

b

F

A

J

A

Join on Discord

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Join on Discord

Hi. I am running a playwright crawler in my linux vps. The vps ihas 8 core CPU and 15533MB memory.
But I got many warning like :
WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available memory.

So how should I fix this ?

Thanks for your help.

1

N

h

b

30 comments

First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher.
Then also use the —max-old-space-size attribute to run your spider

Thanks very much. I will try it.

did you fix this? wondering why even 1 browser is spending 12GB?!

just advanced to level 6! Thanks for your contributions! 🎉

It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages

thank you for your response. struggling over here scaling my crawlers.

Doesn't crawlee automatically close pages?

Nope .
await page.close() is your weapon

omg, after default handler it doesn't close page?

Automatic no .

any other suggestions?

await page.close() at the end of your handler

haha ok.

ty

I already have that.

Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .

We read a lot of PDF's

is there anyway to ensure that gets cleaned up?

Don't know what package you are using for that. Check docs of that particular package.
Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different .
Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .

Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.

There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue

would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?

what time is at your place ?

6PM PST

We can do tomorrow or another time.

Here is 4am. If we can do it in 5-6 hours it would be great for me .

Kids are sleeping and it's hard to have a call at this hour

haha understandable! Up at 4am 😂

i'll add you and we can figured it out. thank you.

Awesome. Thank you.

Add a reply

Sign up and join the conversation on Discord

Join on Discord