Apify Discord Mirror

Updated 5 months ago

infiniteScroll and enqueueLinksByClickingElements

At a glance

The community member is trying to crawl a page with lazy-loaded images and a JavaScript button that expands the pages of posts. They are using the provided code, but the requests are not being fulfilled, with the stats showing 'requestsTotal:0'. The community member is trying to target the "Load More" button on the first page, which seems to be the only place it appears. They are also using preNavigationHooks to read the network requests and store image links, but are unsure if this code should be in the preNavigation hooks instead.

In the comments, another community member suggests that the issue may be that the code is using milliseconds instead of seconds, and has updated the code to use seconds instead. They have also added additional methods to enqueue links by clicking the "Load More" button and to perform infinite scroll, with updated timeout and wait for seconds.

There is no explicitly marked answer in the post or comments.

Hello Team,

I'm trying to crawl a page that has lazy loaded images (on scroll) and an element on the first page that is a JS event 'button' that expands the pages of "posts" on the page.

I'm trying to use the below code, however, it seems like the request never gets filled, the stats show 'requestsTotal:0'.

Plain Text
async requestHandler({ request, page, enqueueLinks, enqueueLinksByClickingElements, infiniteScroll, log }) {

    // Extract links from the current page
    // and add them to the crawling queue.
    await enqueueLinks({ 
      ...<snip>
    });

    await enqueueLinksByClickingElements({
      page,
      selector: '.js-page-load',
      requestQueue: rQueue,
    });

    await infiniteScroll(page, { timeoutSecs: 3000, waitForSecs: 1000 });
    
}

I'm trying to target this:
Plain Text
<button class="pagination-load js-page-load" data-href="/page/2/">Load More <span></span></button>

On the first page..so far it seems like this button is only on the first page.

I'm also using preNavigationHooks to read the network requests and store image LINKS only. I don't know if this code should be in preNavigation hooks instead? Not sure. Thanks for your help as always.
c
2 comments
Ok well one problem is its seconds not milliseconds, so the queue not requesting having to wait, so I've updated that and running a test that is now in progress.
also added method as playwrightUtils
Plain Text
await playwrightUtils.enqueueLinksByClickingElements({
  page,
  selector: '.js-page-load',
  requestQueue: rQueue,
});
const InfiniteScrollOptions  = {
  timeoutSecs: 5,
  waitForSecs: 3,
};
await playwrightUtils.infiniteScroll(page, InfiniteScrollOptions );
Add a reply
Sign up and join the conversation on Discord