Apify and Crawlee Official Forum

Updated 3 weeks ago

Massive Scraper

Hi I have a (noob) question. So I want to crawl many different urls from different pages so they need their own crawler implementation. Some can use the same also. How can I achieve this in crawlee such that they run in parallel and can be lal executed with a single command or also in isolation?

Input and example repos etc. would be highly appreciated
S
C
2 comments
You gave very small pice of information developers would have more questions than answers from your message, and that's why I assume you didn't get any reply

Here is a good example of a scraping system implementation with hight scalability options and good monitoring tools. If you need to implement so called one-time scraper it would be a bad example but in a long term project case it would be one of the best - https://github.com/68publishers/crawler

Here for the future questions I would recommend to stick to these rules - https://stackoverflow.com/help/how-to-ask
you can have multiple pageHandlers using a router. this allows you to change which handler a page is processed by by setting the label property when enqueuing a link. heres an example


Plain Text
  const crawlerConfig = new Configuration({
    //config options
  });

  const router = createPlaywrightRouter();
  router.addHandler(
    'label1',
    label1Handler,
  );
  router.addHandler(
    'label2',
    label2Handler,
  );
  router.addHandler(
    'label3',
    label3Handler,
  );

  const crawlerOptions: PlaywrightCrawlerOptions = {
    requestHandler: router,
    //crawler options
  };

  const crawler = new PlaywrightCrawler(crawlerOptions, crawlerConfig);

crawler.run([
    {url: 'url1', label: 'label1'},
])
Add a reply
Sign up and join the conversation on Discord