autoscale pool trying to scale up without suffecient me...

At a glance

The community member is running a Playwright crawler and experiencing stability issues. The AutoscaledPool is trying to increase concurrency, but the crawler is running into memory issues, causing the pod to restart due to the 4GB RAM limit. The comments suggest that the AutoscaledPool doesn't ensure the memory never goes above the limit, and the community member should either limit the maximum concurrency or adjust the AutoscaledPoolOptions to reduce memory scaling. The community member eventually found that adjusting the snapshotter used memory ratio helped improve the performance.

Useful resources

CCrafty

Hi All,

im running a playwright crawler and am running into a bit of an issue with crawler stability. Have a look at these two log messages

Plain Text

{
  "service": "AutoscaledPool",
  "time": "2024-10-30T16:42:17.049Z",
  "id": "cae4950d568a4b8bac375ffa5a40333c",
  "jobId": "9afee408-42bf-4194-b17c-9864db707e5c",
  "currentConcurrency": "4",
  "desiredConcurrency": "5",
  "systemStatus": "{\"isSystemIdle\":true,\"memInfo\":{\"isOverloaded\":false,\"limitRatio\":0.2,\"actualRatio\":0},\"eventLoopInfo\":{\"isOverloaded\":false,\"limitRatio\":0.6,\"actualRatio\":0},\"cpuInfo\":{\"isOverloaded\":false,\"limitRatio\":0.4,\"actualRatio\":0},\"clientInfo\":{\"isOverloaded\":false,\"limitRatio\":0.3,\"actualRatio\":0}}"
}

autoscaled pool is trying to increase its concurrency from 4 to 5 since it was in its view idle. 20 seconds later though

Plain Text

{
  "rejection": "true",
  "date": "Wed Oct 30 2024 16:42:38 GMT+0000 (Coordinated Universal Time)",
  "process": "{\"pid\":1,\"uid\":997,\"gid\":997,\"cwd\":\"/home/myuser\",\"execPath\":\"/usr/local/bin/node\",\"version\":\"v22.9.0\",\"argv\":[\"/usr/local/bin/node\",\"/home/myuser/FIDO-Scraper-Discovery\"],\"memoryUsage\":{\"rss\":337043456,\"heapTotal\":204886016,\"heapUsed\":168177928,\"external\":30148440,\"arrayBuffers\":14949780}}",
  "os": "{\"loadavg\":[3.08,3.38,3.68],\"uptime\":312222.44}",
  "stack": "response.headerValue: Target page, context or browser has been closed\n    at Page.<anonymous> (/home/myuser/FIDO-Scraper-Discovery/dist/articleImagesPreNavHook.js:15:60)"
}

which suggests memory was much tighter than autoscaledpool was considering, likley due to the additional ram that chromium was using. Crawlee was running in a k8 pod with a 4GB ram limit. Is this behaviour intended and how might i improve my performance? Does autoscaled pool account for how much ram is actually in use or just how much the node process uses?

5 comments

CCrafty

heres a log export from my service. after this the pod autorestarts due to the memory limit

MMarco

The AutoscaledPool doesn't ensure the memory never goes above the limit, it just doesn't scale to more requests if it is close. So if there is a sudden memory spike, like on very heavy page, it can still cause troubles. You can either limit maxConcurrency or play with the autoscaledPoolOptions to reduce memory scaling.

CCrafty

but it seems to me that the pool was still trying to scale up, even while there was no extra memory to be had?

PPepa J

Hi @Crafty if the defaults settings doesn't work for you may adjust the ratios for scaling up by https://crawlee.dev/api/core/interface/AutoscaledPoolOptions in the Crawler options.

CCrafty

Thanks for these, eventually i found the snapshotter used memory ratio and turned it down. 🙂

Add a reply

Apify Discord Mirror

autoscale pool trying to scale up without suffecient memory