Apify Discord Mirror

Updated 5 months ago

Trying to optimize autoscale options

At a glance

The community member is running a scraper on an AWS instance with 8GB CPU and 16GB memory, and is experiencing high CPU and memory usage (around 88%). They have set various configuration parameters like maxConcurrency, maxRequestsPerCrawl, maxRequestRetries, and requestHandlerTimeoutSecs. The community members are discussing potential ways to optimize the scraper's performance, but there is no explicitly marked answer. Some comments suggest that optimization depends on the specific sites being scraped and the scraper's setup, and provide links to relevant documentation on scaling crawlers.

Useful resources
Hello,

I am running my scraper on an AWS 8gb cpu, 16gb memory ecs.

Plain Text
    maxConcurrency: 200,
    maxRequestsPerCrawl: 500,
    maxRequestRetries: 2,
    requestHandlerTimeoutSecs: 185,


Right now the avg cpu and mem are both like 88%. Is there anything I can do here to optimize more?

I also have CRAWLEE_AVAILABLE_MEMORY_RATIO=.8
1
v
b
A
9 comments
Hi , this is a case-by-case thing. It highly depends on scraped sites, whether you are using a browser, browser settings,...
any guide lines?
just advanced to level 5! Thanks for your contributions! πŸŽ‰
also cpu seems to hit 99% no matter what
Plain Text
{"time":"2024-04-11T04:57:31.174Z","level":"INFO","msg":"PuppeteerCrawler:AutoscaledPool: state","scraper":"web","currentConcurrency":18,"desiredConcurrency":17,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":true,"limitRatio":0.6,"actualRatio":0.736},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}    
what does eventLoop overloaded mean?
or how come currentConccurency is 18 when I have maxConcurrency at 200 and there are plenty of request?
Add a reply
Sign up and join the conversation on Discord