Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Crawler works locally but not on cloud

Hello, I've built a puppeteer crawler, nothing special about it.
It works locally flawless, I've tried to deploy to AWS on batch with Fargate, I get navigation timeouts after 60 seconds, switched to EC2, navigation timeouts after 60 seconds, increased navigation timeout to 120 seconds, same error.
Switched proxies between BrightData and OxyLabs, same issue.
Deployed to Apify, same issue.

I'm getting out of my mind understanding why is this happening.
1
P
N
t
17 comments
Hi, may you link me your runId to the DM, so I may check it? Or may you share part of the implementation?
I can share in DM the run id
Ok, so far it looks like the actor was run on the Apify platform with 8GB RAM which is using 2CPUs, but locally It had much more CPUs available πŸ™‚

We solved by lowering maxConcurency.
I kept testing and from what I see I have to use 1vcpu per conc req
That is kinda weird, generally 4GB with 1CPU should handle 4 requests without issues :\
I'm building the docker container locally, and watching the stats in docker stats
if I go 8 conc reqs and 8vcpu , I get timeouts :))
If I go 6 conc reqs and 8vcpu it works almost fine
ram usage is low 900mb out of 8gb
Like generally I would like to see if the same occurs on other websites as well, and also if you are using any 3rd party extensions if that may be the cause :\
Plain Text
const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    launchContext: {
        // launcher: puppeteerExtra,
        // useIncognitoPages: true,
        launchOptions:{
            args: [
                '--disable-gpu',
                '--disable-extensions',
                '--disable-webgl',
                '--disable-dev-shm-usage',
                '--disable-accelerated-2d-canvas',
                '--disable-accelerated-jpeg-decoding',
                '--disable-accelerated-mjpeg-decode',
                '--disable-accelerated-video-decode',
                '--disable-software-rasterizer',
                '--disable-notifications',
                '--disable-background-networking',
                '--disable-background-timer-throttling'
            ],
        }
    },
    requestHandler: router,
    maxConcurrency: 6,
    maxRequestRetries: 50,
    useSessionPool: true,
    navigationTimeoutSecs: 90,
    requestHandlerTimeoutSecs: 90,
    failedRequestHandler({ request }) {
      log.debug(`Request ${request.url} failed 50 times.`);
    },
    preNavigationHooks: [
        abortAssets,
        async (_, gotoOptions) => {
            gotoOptions.waitUntil = "domcontentloaded";
        }
    ],

});

3rd party extensions are disabled.
also added those args in order to get less stress on the cpu
And the problem is still there?
yep, I'm trying now with 8vcpu and 8gb ram to run with 24 conc reqs
i want to specify profile when using pupeteer, i did like this but it doesn't work, can someone help me?
Attachment
image.png
just advanced to level 1! Thanks for your contributions! πŸŽ‰
Add a reply
Sign up and join the conversation on Discord
Join