im scraping entire sites and running multiple crawlers at once for each site - looking to scrape 50+ sites, and im running multiple site scrapes at once from a start file emitting an event emitter to run each site, specifically the
await crawler.run(startUrls)
line.
Should i run them all at once in one terminal or run each one in separate terminals with different scripts to run each scraper
Also is this a maintanable approach to run multiple crawler instances at once
One final problem I am running into is that I'll run the start file but I get this request queue error when I run multiple crawlers
when i run it again, it sometimes works, but it's inconsistent in how this error pops up
request queue error:
ERROR CheerioCrawler:AutoscaledPool: isTaskReadyFunction failed
ectory, open 'C:\Users\haris\OneDrive\Documents\GitHub\periodicScraper01\pscrape\storage\request_queues\default\1Rk4szfVGlTLik4.json'] {
errno: -4058,
code: 'ENOENT',
syscall: 'open',
path: 'C:\\Users\\haris\\OneDrive\\Documents\\GitHub\\periodicScraper01\\pscrape\\storage\\request_queues\\default\\1Rk4szfVGlTLik4.json'
}