Apify and Crawlee Official Forum

Updated 3 months ago

infinite scrolling

trying to get infinite scrolling to render in all products while scraping them as the page is being scrolled down
i looked at the documentation but didnt understand how to do this:
Plain Text
kotnRouter.addHandler('KOTN_DETAIL', async ({ log, page, parseWithCheerio }) => {
    log.info(`Scraping product URLs`);
  
    const $ = await parseWithCheerio()

    const productUrls: string[] = [];
  
    $('a').each((_, el) => {
        let productUrl = $(el).attr('href');
        if (productUrl) {
          if (!productUrl.startsWith('https://')) {
            productUrl = 'https://www.kotn.com' + productUrl;
            if(productUrl.includes('/products')){
                productUrls.push(productUrl);

            }
          } 
        }
    });
  
    // Push unique URLs to the dataset
    const uniqueProductUrls = Array.from(new Set(productUrls));
  
    await Dataset.pushData({
      urls: uniqueProductUrls,
    });
  
    await Promise.all(uniqueProductUrls.map(link => kotnPw.addRequests([{ url: link, label: 'KOTN_PRODUCT' }])));
  
    linksCount += uniqueProductUrls.length;
  
    await infiniteScroll(page, {
        maxScrollHeight: 0,
    });

    console.log(uniqueProductUrls);
    console.log(`Total product links scraped so far: ${linksCount}`);
    // Run bronPuppet crawler once after pushing the first product requests
    if (linksCount === uniqueProductUrls.length) {
      await kotnPw.run();
    }
});
1
h
H
A
6 comments
i also want to make sure it scrolls up a little bit every time it scrolls fully down to make sure it renders it in properly
To make it scroll up a bit every time after it scrolls down, you can use this option:
https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#scrollDownAndUp
For scraping the products you can either,

Wait for the scroll to finish and then select all the products and add them to the queue.

Or

You can add the infiniteScroll to a Promise.all or Promise.race in orderer for it to keep scrolling while you run another function beside it in the same Promise.all or Promise.race.

Or

You can run the infiniteScroll function, and inside the stopScrollCallback option, you can collect the products and stop it once you don't find more.
https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#stopScrollCallback
how do you implement this into the router do you write it in under a playwrightutils class or what do you do
just advanced to level 5! Thanks for your contributions! πŸŽ‰
Hey , you can either use the context aware method from context object, or you can use the method from playwrighUtils/puppeteerUtils, that needs page object as an argument.
Add a reply
Sign up and join the conversation on Discord