site-1
and then site-2
initally before starting the crawler and then the crawler will dynamically add in the links as needed but this will mess up the logs since we are using a Queue and its FIFO, so first it'll crawl the first link, add the extracted links to the queue and then crawl the second link and its links to the queue and like this it'll keep switching contexts between the two sites which will make the logs a mess. Also routers, dont seem to have a url parameter, its just a category and then the request, so we will have to basically define handlers for each site in a single router right? which will just bloat up a single file.requestHandler
for everything and decide which function you want to use based on the request url.requestHandler: async (context) => { const { request } = context; if (/mydomain1\.com/.test(request.url)) { await processSite1(context); } else (/mydomain2\.com/.test(request.url)) { await processSite2(context); } }
processSite*
funcs with a router and have it handle it on a case by case basis?create***Route()
function? You should be able to create several of these. Calling create***Route()
will return a function and you need to pass the context
to it. Then in the requestHandler
you need to decide which of there routers you want to use, I don't know what are your requirements for this.RequestHandler
call the relevant route handler according to the site, in the snippet you linked await processSite1(context);
I wanted to know if I could instead import the route from some other place and then pass in the context to it, because in the doc exampleawait crawler.run('site:1') await crawler.run('site:2')
export const myRoutes = createRout(); myRoutes.addHandler("LABEL", (context) => { c })
import { myRoutes } from './routes/my-routes.js';
site1
and site2
and they both have separate routes like site1_routes
and site2_routes
, requestHandler
I want to do something like thisrequestHandler: async (context) => { const { request } = context; if (/mydomain1\.com/.test(request.url)) { await site1_routes(context); } else (/mydomain2\.com/.test(request.url)) { await site2_routes(context); } }
createPlaywrightRouter
does take in a context option but I havent seen it used with explicitly passing in context anywhereawait Actor.init(); const startUrls = ['https://apify.com', 'https://google.com']; export const apifyRouter = createPuppeteerRouter(); apifyRouter.addDefaultHandler(async ({ log }) => { log.info(`Hello from Apify!`); }); export const googleRouter = createPuppeteerRouter(); googleRouter.addDefaultHandler(async ({ log }) => { log.info(`Hello from Google!`); }); const crawler = new PuppeteerCrawler({ requestHandler: async (context) => { if (/apify\.com/.test(context.request.url)) { await apifyRouter(context); } else if (/google\.com/.test(context.request.url)) { await googleRouter(context); } }, }); await crawler.run(startUrls); await Actor.exit();
googleRouter
and apifyRouter
out of different files.