Hey,
For your task, I'd use 2 request handlers:
JSON
handler will handle the JSON response, it'll parse it and enqueue HTML requestsHTML
handler will parse HTML response as usual with cheerio's $
JSON
and
HTML
are request labels, you can read more about labels
here. Basically, if you label a request with e.g.
HTML
label, it will be handled with
HTML
request handler.
const router = createCheerioRouter();
// add request handler for handling `JSON` labelled requests
router.addHandler('JSON', async ({ body, crawler }) => {
// parse JSON response
const json = JSON.parse(body.toString());
// enqueue HTML requests
await crawler.addRequests([{ url: '...', userData: { label: 'HTML' } }]);
});
// add request handler for handling `HTML` labelled requests
router.addHandler('HTML', async ({ $ }) => {
// parse HTML response
});
const crawler = new CheerioCrawler({
proxyConfiguration,
maxRequestsPerCrawl,
requestHandler: router,
});
await crawler.run([{ url: '...', userData: { label: 'JSON' } }]);
Let me know if you have any questions