Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Add certificates to Playwright crawler using Chromium

hey folks, we are trying to integrate a proxy into our crawlers and the issue is the proxy needs certificate to be present before it'll allow us to authenticate, I couldnt find any option for this in the documentation.

Is there a way I can add those certs in crawlee/playwright? or if crawlee exposes agentOptions from Playwright anywhere (couldn't find it in the docs), that'll also work as per https://github.com/microsoft/playwright/issues/1799#issuecomment-959011162
A
P
A
26 comments
P.S. I have added that certificate on my server and curl is working fine but crawlee is not
so I'm assuming crawlee is not picking it up
and the error I'm getting is page.goto: net::ERR_PROXY_CONNECTION_FAILED at
Hi ,
Crawlee is using Playwright under the hood, so you should be able to intercept request in usual way. There I found an example for Playwright itself ( https://github.com/microsoft/playwright/issues/1799#issuecomment-959011162 ).

Can you do minimal working example using only Playwright (witohut Crawlee) to confirm that the issue is in Crawlee and not in Playwright itself? - I found a lot of issues regarding using certificated in Playwright.
hey, when I use playwright. it gives me a cert invalid error and which I can bypass by using
Plain Text
launchOptions: {
                args: ['--ignore-certificate-errors'],
            },
but with crawlee its not working
I think I got it wrong, I dont need to use the proxy's certificate with playwright/crawlee
its just a proxy config issue
page.goto: net::ERR_PROXY_CONNECTION_FAILED here's the full error
we bypassed it by avoiding the cert route and its working fine for us
Does the same proxy configuration works for other websites?
not with crawlee but with playwright yeah
it worked with crawlee once we full onboarded with the proxy provider and we didnt need to use their cert
Can you please provide code snippet with your current configration for Crawlee?
Plain Text
chromium.use(stealthPlugin());
    let queue = await RequestQueue.open('crawler');
    await queue.drop();
    queue = await RequestQueue.open('crawler');
    const startUrls = [`url`];
    const router = await initRouter({ resume, numPages, initialPage });
    const crawler = new PlaywrightCrawler({
        requestHandler: router,
        maxRequestsPerMinute: 100,
        log: new Log({
            logger: new CrawlerLogger(log.getOptions(), 'CRAWLER_1'), // please ignore, custom logger imp
            level: log.LEVELS.DEBUG,
        }),
        requestQueue: queue,
        launchContext: {
            launcher: chromium,
            launchOptions: {
                args: ['--ignore-certificate-errors'],
            },
        },
        ...(useProxy && {
            proxyConfiguration: new ProxyConfiguration({
                proxyUrls: [
                    proxy,
                ],
            }),
            useSessionPool: true,
            persistCookiesPerSession: true,
        }),
        
    });
    await crawler.run(startUrls, {});
Thank you for your feedback, I am currently investigating this with the Crawlee developer team.

Would it be possible to also provide us with the pure Playwright solution code, that is currently working for you? Is the certificate taken from system or are you importing it on application level?
its taken from the system
just advanced to level 7! Thanks for your contributions! πŸŽ‰
I was trying to figure out how to do it on an app level but couldnt make it work
but in the end system level worked fine
here's the pure playwright code
Plain Text
const browser = await chromium.launch(
    {
        proxy:{
            server:"proxy_url",
            username:"username",
            password:"pwd"
        },
        args: ['--ignore-certificate-errors'],
    }
)

const page = await browser.newPage()

await page.goto('https://google.com')
const html = await page.innerHTML('body')
console.log(html)
I think it was an issue on our end, because after full acc activation with the proxy provider, it worked just fine, only issues we are currently facing is that a lot of our requests are failing with the proxy but thats unrelated to this is probably a config issue
You should be able to replicate this event in Crawlee:
Plain Text
const crawler = new PlaywrightCrawler({
    // ... ,
    launchContext: {
      launchOptions: {
        proxy: {
          'server': 'http://proxy_url',
          'username': 'username',
          'password': 'password'
        },
        args: ['--ignore-certificate-errors'],
      }
    }
  });

and drop the proxyConfiguration attributte.

And please let me know if it helped πŸ™‚
hey thanks! really appreciate it
I can't repro the original issue because we are not relying on the certs anymore but this method is also working for us
Add a reply
Sign up and join the conversation on Discord
Join