Apify and Crawlee Official Forum

Updated 2 months ago

Anyone have any example scraping multiple different websites?

The structure i am doing idoes not look like the best.

I am basically creating several routers and then doing something like:

Plain Text
const crawler = new PlaywrightCrawler({
  // proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),
  requestHandler: async (ctx) => {
    if (ctx.request.url.includes("url1")) {
      await url1Router(ctx);
    }

    if (ctx.request.url.includes("url2")) {
      await url2Router(ctx);
    }

    if (ctx.request.url.includes("url3")) {
      await url3Router(ctx);
    }
    await Dataset.exportToJSON("data.json");
  },

  // Comment this option to scrape the full website.

  //   maxRequestsPerCrawl: 20,
});


This does not seem correct. Anyone with a better way?
M
O
8 comments
Create a route for each URL, then use labels to identify them.
@Marco , how far is that from what i am doing there? because it seems like soewhere i will have to do it? in the example above i did a router per url
there , urlRouter1, urlRouter2 is defined on a per url basis. am i wrong?
It's actually very similar. Routes should be defined depending on your needs, so if you need a route per URL, just do that.
my concern is that i have multiple websites, not just different urls. each website might have two urls that i have to scrape independently
is that how you would do it @Marco ? would you have multiple routers ?
Oh, I see. I think I would still use one router, with labels such as "website1-page2", to keep things simple; a function called at the beginning would assign the correct label to each request based on the URL.
Add a reply
Sign up and join the conversation on Discord