Apify Discord Mirror

Updated 2 months ago

Massive Scraper

At a glance

The community member has a question about how to crawl multiple URLs from different pages using Crawlee, and run them in parallel with a single command or in isolation. The first comment suggests that the question lacks details, and provides a link to a GitHub repository as an example of a scalable scraping system. The second comment provides an example of using a router to handle multiple page handlers in Crawlee, allowing the community member to change the handler for a page by setting the label property when enqueueing a link.

Useful resources
Hi I have a (noob) question. So I want to crawl many different urls from different pages so they need their own crawler implementation. Some can use the same also. How can I achieve this in crawlee such that they run in parallel and can be lal executed with a single command or also in isolation?

Input and example repos etc. would be highly appreciated
You gave very small pice of information developers would have more questions than answers from your message, and that's why I assume you didn't get any reply

Here is a good example of a scraping system implementation with hight scalability options and good monitoring tools. If you need to implement so called one-time scraper it would be a bad example but in a long term project case it would be one of the best - https://github.com/68publishers/crawler

Here for the future questions I would recommend to stick to these rules - https://stackoverflow.com/help/how-to-ask
you can have multiple pageHandlers using a router. this allows you to change which handler a page is processed by by setting the label property when enqueuing a link. heres an example

Plain Text
  const crawlerConfig = new Configuration({
    //config options

  const router = createPlaywrightRouter();

  const crawlerOptions: PlaywrightCrawlerOptions = {
    requestHandler: router,
    //crawler options

  const crawler = new PlaywrightCrawler(crawlerOptions, crawlerConfig);

    {url: 'url1', label: 'label1'},
Add a reply
Sign up and join the conversation on Discord