Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
nathanist
n
nathanist
Offline, last seen last month
Joined August 30, 2024
Here's what happened, in order:

  • I shared my actor with my secondary account with permissions to run, build, write and read;
  • On my secondary account, I tried creating a new version of my actor;
  • All the previous ones disappeared;
  • I tried saving the new version I just created but I get the error "Source could not be saved (Actor version was not found)";
  • When I try to create and save a new version I get the same error, on both accounts;
My actor is this one: https://apify.com/natanielsantos/shein-scraper
3 comments
L
n
I'm trying to subscribe to Creator Plan but when I click on 'Subscribe' on https://apify.com/pricing/creator-plan, it redirects to console but nothing happens. I guess this is a bug?
6 comments
m
n
Is there a way to prevent the crawler from adding a failed request to the default RequestQueue?

Plain Text
const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    requestHandler: router,
    maxRequestRetries: 25,
    requestList: await RequestList.open(null, [initUrl]),
    requestHandlerTimeoutSecs: 2000,
    maxConcurrency: 1,
}, config);

I'm using the default RequestQueue to add productUrls, and they're being handled inside the defaultRequestHandler, but when some of them fails, I purposely throw an Error, expecting the failed request(which is the initUrl) goes back to RequestList, but it goes to the default RequestQueue too, which is not what I want.
2 comments
A
n
Is there a way to prevent the crawler from adding a failed request to the default RequestQueue?

Plain Text
const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    requestHandler: router,
    maxRequestRetries: 25,
    requestList: await RequestList.open(null, [initUrl]),
    requestHandlerTimeoutSecs: 2000,
    maxConcurrency: 1,
}, config);

I'm using the default RequestQueue to add productUrls, and they're being handled inside the defaultRequestHandler, but when some of them fails, I purposely throw an Error, expecting the failed request(which is the initUrl) goes back to RequestList, but it goes to the default RequestQueue too, which is not what I want.
2 comments
A
n
All the data I need is on the response to an specific request, which occurs before the page is loaded. What I'm trying to achieve is to close the page and go to the next request as soon as I got what I need, so I tried doing it on preNavigationHooks:

preNavigationHooks: [ async (crawlingContext, gotoOptions) => { const { page, request, log } = crawlingContext; gotoOptions.waitUntil = 'load'; if (isProductUrl) { page.on('response', async (response) => { if (response.request().url().includes('productdetail')) { try { const data = await response.json(); await Actor.pushData(data); await defaultQueue.markRequestHandled(request); page.removeAllListeners('response'); await page.close(); } catch (err) { log.error(err); } } }); } }, ]

But I'm getting this error:

WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Navigation failed because browser has disconnected!

When I remove the await page.close(); line I get this error:

WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. requestHandler timed out after 130 seconds (o4wrbxkzgU1eP2n).

This is the default handler. It only contains code related to enqueuing URLs:

router.addDefaultHandler(async ({ request }) => { if (searchPageUrlPattern.test(request.url)) { // Enqueue links... } });
1 comment
R
All the data I need is on the response to an specific request, which occurs before the page is loaded. What I'm trying to achieve is to close the page and go to the next request as soon as I got what I need, so I tried doing it on preNavigationHooks:

preNavigationHooks: [ async (crawlingContext, gotoOptions) => { const { page, request, log } = crawlingContext; gotoOptions.waitUntil = 'load'; if (isProductUrl) { page.on('response', async (response) => { if (response.request().url().includes('productdetail')) { try { const data = await response.json(); await Actor.pushData(data); await defaultQueue.markRequestHandled(request); page.removeAllListeners('response'); await page.close(); } catch (err) { log.error(err); } } }); } }, ]

But I'm getting this error:

WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Navigation failed because browser has disconnected!

When I remove the await page.close(); line I get this error:

WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. requestHandler timed out after 130 seconds (o4wrbxkzgU1eP2n).

This is the default handler. It only contains code related to enqueuing URLs:

router.addDefaultHandler(async ({ request }) => { if (searchPageUrlPattern.test(request.url)) { // Enqueue links... } });
1 comment
R