Apify and Crawlee Official Forum

Updated last month

which browser is the best to crawl

As title said

I’m using chromium currently but it is cpu heavy in usage

Killing browser do not kill the process and because of that it’s easy to get 100% cpu usage pretty quickly

(I’m crawling thousands of websites where on each I’m looking for different data) I already try to load pure html without css, images and other assets, that helped a lot but issue is still there
L
W
O
3 comments
Hi @h
I recommend also blocking unnecessary network requests. with the blockRequests
Make sure that are running it in headless mode.
Also you could try using cheerio if the use-case allows it.

Regarding your question about the browser:
Firefox tends to be lighter on CPU usage.
yes I already do that
Plain Text
   const launchContext: PlaywrightLaunchContext = {
      launcher: firefox,
      launchOptions: {
        headless: false,
        args: [
          '--no-sandbox',
          '--disable-setuid-sandbox',
          '--disable-dev-shm-usage',
        ],
      },
      useChrome: false, // Use Chromium instead of Chrome for better performance
      userAgent: userAgents[Math.floor(Math.random() * userAgents.length)],
    }
...
     launchContext,
      preNavigationHooks: [
        async ({ page }) => {
          await playwrightUtils.blockRequests(page, {
            urlPatterns: [
              '.png',
              '.jpg',
              '.jpeg',
              '.gif',
              '.svg',
              '.ico',
              '.woff',
              '.woff2',
              'adsbygoogle.js',
            ],
            extraUrlPatterns: ['adsbygoogle.js'],
          })

          await playwrightUtils.closeCookieModals(page)
        },
      ],

unfortunetly I recive: WARN Playwright Utils: blockRequests() helper is incompatible with non-Chromium browsers.


I didn't know that 😄
you can block requests manually (I mean not using util func)
Example:

Plain Text
const BLOCKED = ['image', 'stylesheet', 'media', 'font','other'];

Then within your preNavigationHooks of your crawler, add this function:
async ({ page }) => {
    await page.route('**/*', (route) => {
        if (BLOCKED.includes(route.request().resourceType())) return route.abort();
        return route.continue()
    });
};
Add a reply
Sign up and join the conversation on Discord