Apify and Crawlee Official Forum

Updated 4 months ago

How to determine if dynamic content is loaded or not. PuppeteerCrawler

in the requestHandler I'm trying to click to the pagination next button and I cannot determine if the content is changed or not.
How can I do it? waitfornetworkidle does not seem to work here. any ideas? See the GIF

Plain Text
new PuppeteerCrawler({
preNavigationHooks: [
        async ({ page }) => {
            page.on('response', async (res) => {
                if (res.url().includes('api/offersearches/filters')) {
                    try {
                        const json = await res.json();
                        const jsonString = JSON.stringify(json);
                        const filePath = 'data.json';
                        fs.appendFile(filePath, jsonString + '\n', () => {});
                    } catch (err) {
                        console.error('Response wasn\'t JSON or failed to parse response.');
                    }
                }
            });
        },
    ],
    async requestHandler({ request, page }) {
        for (let i = 0; i < maxNumberOfPages; i++) {
            const isDisabled = await page.evaluate(() => document.querySelector('[data-testid="mo-pagination-next"] button.mo-button--pagination').disabled);
            if (isDisabled) {
                break;
            }

            await Promise.all([
                page.waitForNetworkIdle(),
                page.click('[data-testid="mo-pagination-next"] button.mo-button--pagination'),
            ]);
            console.log('clicked'); // it never reaches
        }
    },
});


Here's my code so far. Currently button is clicked OK, the data is fetched OK. it just hangs in the end, I guess waitForNetworkIdle is never resolving
Attachment
screencast_2024-05-28_12-08-08.gif
L
4
4 comments
Hi
You might want to try Page.waitForFunction() method
https://pptr.dev/api/puppeteer.page.waitforfunction

Or you could wait for a specific selector that is loaded when the request is resolved
https://pptr.dev/api/puppeteer.page.waitforselector

Or you could wait for the request that fetches the data with Page.waitForResponse() method
https://pptr.dev/api/puppeteer.page.waitforresponse

It depends what works for you the best πŸ™‚ . Hope this helps
Thank you for your response.
waitForSelector does not seem to work as the html is already there it just updates once ajax request is complete

waitForResponse is fired before DOM is changed, so this does not work as well. I was able to make it work adding timer after response is ready for 3 sec, but that's not the right way I guess.

waitForFunction - I am not sure how can I utilize this in my case.
Anyway, I was able to implement the scraper via adding next page URL to the request queue instead. So task is complete but the question I've asked is still open for me (
Waiting for some time with page.waitForTimeout is another way, but of course if there are going to be some delays from the website there are going to be troubles with it, so i would use it only as a last option if nothing else works.

with waitForFunction you could save the initial content
const initialContent = await page.evaluate(() => document.querySelector('[data-testid="content-element"]').textContent);

and then wait for changes
await page.waitForFunction( (initialContent) => { const newContent = document.querySelector('[data-testid="content-element"]').textContent; return newContent !== initialContent; }, { timeout: 10000 }, initialContent )

But if you can just add the urls to the request que then i would go with that approach.
Add a reply
Sign up and join the conversation on Discord