PuppeteerCrawler waitForResponse timeout issue. Seems l...

44unkur

I'm trying to get the data from ajax post call (graphQL) on a webpage but it does not seem to work
I have tried to run the crawler with headful mode and open the network tab, the request is being made and response is there but waitForResponse does not seem to work (
Here's my code:

Plain Text

const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    requestQueue,
    maxRequestRetries: 5,
    navigationTimeoutSecs: 180,
    requestHandlerTimeoutSecs: 180,
    async requestHandler({ request, page }) {
// ...
   log.warning('GraphQL starting to wait');

    await page.waitForNetworkIdle();

    log.warning('IDLE!!!');

    await page.waitForRequest(
        (req) => req.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
    );

    log.warning('GraphQL request is done');

    const response = await page.waitForResponse(
        (httpResponse) => httpResponse.status() === 200 && httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
        { timeout: 180 * 1000 },
    );

    log.warning('GraphQL response arrived');

    const data = await response.json();
//...

As you can see I also have added waitForNetworkIdle for testing and it finishes before waitForResponse, which is strange. See the logs:

Plain Text

INFO  Page opened. {"label":"vehicle","url":"https://www.autotrader.co.uk/car-details/202307270142806?sort=relevance&advertising-location=at_cars&make=Audi&model=A2&postcode=PO16%207GZ&fromsra"}
WARN  GraphQL starting to wait
WARN  IDLE!!!
WARN  PuppeteerCrawler: Reclaiming failed request back to the list or queue. Timed out after waiting 30000ms

Maybe I'm missing something?
By the way, the code was written for apify sdk version 1 and was working OK. I have upgraded to v3 and it stopped working OR it works reallly slow. like really slow

9 comments

mmemo23

I would use page.on response event, just add condition for that particular link, if you keep on struggling dm me

44unkur

https://stackoverflow.com/questions/77397585/how-to-wait-for-specific-ajax-request-in-puppeteer-crawler

Few month ago I was fixing this exact scrapper and had this same issue. But I was able to solve it with waitForResponse and it was working OK with Apify sdk v1.
Now with Apify SDK v3 it's not working as expected

mmemo23

Dm me

AAlexey Udovydchenko

You need to add waiting for response in preNavigationHooks like decribed here: https://docs.apify.com/academy/node-js/how_to_fix_target-closed#page-closed-solution

44unkur

Спасибо большое, это то что я искал 🥹

44unkur

I've tried the above example and in my case the context is always undefined:

Plain Text

preNavigationHooks: [
        async ({ page, context }) => {
            log.info('context', { context, type: typeof context });
            context.responsePromise = page
                .waitForResponse(`https://www.autotrader.co.uk${URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH}`)
                .catch((e) => e);

Plain Text

INFO  context {"type":"undefined"}
WARN  PuppeteerCrawler: Reclaiming failed request back to the list or queue. TypeError: Cannot set properties of undefined (setting 'responsePromise')

AApifyBot

just advanced to level 2! Thanks for your contributions! 🎉

44unkur

Plain Text

    preNavigationHooks: [
        async (context) => {
            context.responsePromise = context.page
                .waitForResponse((httpResponse) => httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH))
                .catch((e) => e);
        },
    ],
    async requestHandler({ request, page, responsePromise }) {

basically I did this in the end and it is working.

Thank very much for sharing the right article.

Add a reply

Apify and Crawlee Official Forum

PuppeteerCrawler waitForResponse timeout issue. Seems like it skips desired request