Apify Discord Mirror

Updated last month

Moving from Playwright to Crawlee/Playwright for Scraping

At a glance
The community member's post asks if there are any resources on building a scraper with Crawlee, apart from the ones in the documentation, and where to set the browser context. The post includes a code snippet for launching a Playwright browser and setting up the browser context. The comments provide some examples and suggestions. One community member suggests looking at the Playwright crawler example in the Apify SDK documentation. Another community member provides an example of using the preNavigationHooks option in the PlaywrightCrawler to set the user agent and emulate a specific device. There is no explicitly marked answer in the comments.
Useful resources
Are there actually any ressources on building a scraper with crawlee except the one in the docs?
Where do I set all the browser context for example?

Plain Text
const launchPlaywright = async () => {
  const browser = await playwright["chromium"].launch({
    headless: true,
    args: ["--disable-blink-features=AutomationControlled"],
  });

  const context = await browser.newContext({
    viewport: { width: 1280, height: 720 },
    userAgent:
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
    geolocation: { longitude: 7.8421, latitude: 47.9978 },
    permissions: ["geolocation"],
    locale: "en-US",
    storageState: "playwright/auth/user.json",
  });
  return await context.newPage();
};
a
2 comments
Or within the pre navigation hook

Something like:

const crawler = new PlaywrightCrawler({
preNavigationHooks: [
async ({ page, request, browserContext }) => {
// Set a specific user agent for the browser context
await browserContext.addCookies([
{ name: 'session', value: '12345', domain: 'example.com' },
]);

// Emulate a specific device (e.g., mobile)
await page.setUserAgent(
'Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1'
);
},
],
requestHandler: async ({ page, request }) => {
console.log(Visiting ${request.url});
const content = await page.content();
console.log(Content length: ${content.length});
},
});

await crawler.run(['https://example.com']);
Add a reply
Sign up and join the conversation on Discord