Hi All,
im running a playwright crawler and am running into a bit of an issue with crawler stability. Have a look at these two log messages
{
"service": "AutoscaledPool",
"time": "2024-10-30T16:42:17.049Z",
"id": "cae4950d568a4b8bac375ffa5a40333c",
"jobId": "9afee408-42bf-4194-b17c-9864db707e5c",
"currentConcurrency": "4",
"desiredConcurrency": "5",
"systemStatus": "{\"isSystemIdle\":true,\"memInfo\":{\"isOverloaded\":false,\"limitRatio\":0.2,\"actualRatio\":0},\"eventLoopInfo\":{\"isOverloaded\":false,\"limitRatio\":0.6,\"actualRatio\":0},\"cpuInfo\":{\"isOverloaded\":false,\"limitRatio\":0.4,\"actualRatio\":0},\"clientInfo\":{\"isOverloaded\":false,\"limitRatio\":0.3,\"actualRatio\":0}}"
}
autoscaled pool is trying to increase its concurrency from 4 to 5 since it was in its view idle. 20 seconds later though
{
"rejection": "true",
"date": "Wed Oct 30 2024 16:42:38 GMT+0000 (Coordinated Universal Time)",
"process": "{\"pid\":1,\"uid\":997,\"gid\":997,\"cwd\":\"/home/myuser\",\"execPath\":\"/usr/local/bin/node\",\"version\":\"v22.9.0\",\"argv\":[\"/usr/local/bin/node\",\"/home/myuser/FIDO-Scraper-Discovery\"],\"memoryUsage\":{\"rss\":337043456,\"heapTotal\":204886016,\"heapUsed\":168177928,\"external\":30148440,\"arrayBuffers\":14949780}}",
"os": "{\"loadavg\":[3.08,3.38,3.68],\"uptime\":312222.44}",
"stack": "response.headerValue: Target page, context or browser has been closed\n at Page.<anonymous> (/home/myuser/FIDO-Scraper-Discovery/dist/articleImagesPreNavHook.js:15:60)"
}
which suggests memory was much tighter than autoscaledpool was considering, likley due to the additional ram that chromium was using. Crawlee was running in a k8 pod with a 4GB ram limit. Is this behaviour intended and how might i improve my performance? Does autoscaled pool account for how much ram is actually in use or just how much the node process uses?