Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Scrape JSON and HTML responses in different handlers

I do not know how to scrape a website, that contains JSON and HTML responses

My scraper need to:
  1. Send a request and parse a JSON response which contains a list of URL that I will enqueue.
  2. Scrape those URLs but in HTML using cheerio or whatever is required to do so.
o
1 comment
Hey,

For your task, I'd use 2 request handlers:
  • JSON handler will handle the JSON response, it'll parse it and enqueue HTML requests
  • HTML handler will parse HTML response as usual with cheerio's $
JSON and HTML are request labels, you can read more about labels here. Basically, if you label a request with e.g. HTML label, it will be handled with HTML request handler.

Plain Text
const router = createCheerioRouter();

// add request handler for handling `JSON` labelled requests
router.addHandler('JSON', async ({ body, crawler }) => {
    // parse JSON response
    const json = JSON.parse(body.toString());

    // enqueue HTML requests
    await crawler.addRequests([{ url: '...', userData: { label: 'HTML' } }]);
});

// add request handler for handling `HTML` labelled requests
router.addHandler('HTML', async ({ $ }) => {
    // parse HTML response
});

const crawler = new CheerioCrawler({
    proxyConfiguration,
    maxRequestsPerCrawl,
    requestHandler: router,
});

await crawler.run([{ url: '...', userData: { label: 'JSON' } }]);


Let me know if you have any questions
Add a reply
Sign up and join the conversation on Discord
Join