// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad setInterval(() => { const { requestsFinished, requestsFailed } = crawler.stats.state if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro console.warn(`π£ Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`) process.exit(1) } }, 3000)
storage: new MemoryStorage()
in Actor.main second argument as noted in docs & TS definitions, but actor runs on platform still seems to use "platform RQ", not "in-memory one". Any pointers?// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad setInterval(() => { const { requestsFinished, requestsFailed } = crawler.stats.state if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro console.warn(`π£ Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`) process.exit(1) } }, 3000)
storage: new MemoryStorage()
in Actor.main second argument as noted in docs & TS definitions, but actor runs on platform still seems to use "platform RQ", not "in-memory one". Any pointers?CRAWLEE_PURGE_ON_START=false
so it only run the previously problematic urls. Iterate few times to catch all bugs, and then run the whole crawler with purged storage.CRAWLEE_PURGE_ON_START=false
so it only run the previously problematic urls. Iterate few times to catch all bugs, and then run the whole crawler with purged storage.