top
. There are a bunch of long-running chrome processes that haven't been killed.{"time":"2024-05-20T03:04:41.809Z","level":"WARNING","msg":"PuppeteerCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 16268 MB of 14071 MB (116%). Consider increasing available memory.","scraper":"web","url":"https://www.natronacounty-wy.gov/845/LegalPublic-Notices","place_id":"65a603fac769fa16f6596a8f"}
maxConcurrency: 200, maxRequestsPerCrawl: 500, maxRequestRetries: 2, requestHandlerTimeoutSecs: 185,
CRAWLEE_AVAILABLE_MEMORY_RATIO=.8
.run()
?async (crawlingContext, gotoOptions) => { const { page, request, crawler } = crawlingContext const queue = await crawler.getRequestQueue() const crawler_dto = request.userData.crawler_dto if (!request.url.endsWith('.pdf')) { gotoOptions.waitUntil = 'networkidle2' gotoOptions.timeout = 20000 await page.setBypassCSP(true) await page.setExtraHTTPHeaders({ 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8', }) await page.setViewport({ width: 1440, height: 900 }) } page.on('response', async (page_response) => { if (page_response.headers()['content-type'] === 'application/pdf') { gotoOptions.timeout = 1 } }) },
"Failed to launch browser. Please check the following:\n- Check whether the provided executable path \"/usr/bin/google-chrome\" is correct.\n- Try installing a browser, if it's missing, by running `npx @puppeteer/browsers install chromium --path [path]` and pointing `executablePath` to the downloaded executable (https://pptr.dev/browsers-api)\n\nThe original error is available in the `cause` property. Below is the error received when trying to launch a browser:\n","stack":"Failed to launch browser. Please check the following:\n- Check whether the provided executable path \"/usr/bin/google-chrome\" is correct.\n- Try installing a browser, if it's missing, by running `npx @puppeteer/browsers install chromium --path [path]` and pointing `executablePath` to the downloaded executable (https://pptr.dev/browsers-api)\n\nThe original error is available in the `cause` property. Below is the error received when trying to launch a browser:\n\nError: ENOSPC: no space left on device, mkdtemp '/tmp/puppeteer_dev_profile-pXEfmi'\nError thrown at:\n\n at PuppeteerPlugin._throwAugmentedLaunchError (/home/app/node_modules/@crawlee/browser-pool/abstract-classes/browser-plugin.js:145:15)\n at PuppeteerPlugin._launch (/home/app/node_modules/@crawlee/browser-
autoscaledPoolOptions: { isFinishedFunction: async () => { const web_crawler_queue = await RequestQueue.open(place_id) // return this.isFinishedFunction() && await web_crawler_queue.isFinished() return await request_queue.isFinished() && await web_crawler_queue.isFinished() } },
WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds. {"id":"icvnyTXX7zWJjgV","url":"https://www.gastongov.com/486/Transportation-Planning","retryCount":2}
maxConcurrency: 200, maxRequestsPerCrawl: 500, maxRequestRetries: 2, requestHandlerTimeoutSecs: 185,
{"time":"2024-04-15T00:09:08.818Z","level":"INFO","msg":"PuppeteerCrawler:AutoscaledPool: state","currentConcurrency":1,"desiredConcurrency":1,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.106},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":1},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
net::ERR_ABORTED
when scraping any type of pdf file, so, the only way I've been able to figure out how to handle this is in a preNavigationHook, since, I can't catch the error in the routerhandler.