useFingerprints: true
, useFingerprintCache: false
, launcher: firefox
pluginContent
string) taken from here: https://discord.com/channels/801163717915574323/1059483872271798333preNavigationHooks: [ async ({ page, request }) => { await page.addInitScript({ content: pluginContent }); },
crawlee/core 3.3.1 playwright 1.33.0 npm: 8.19.3 node: 16.19.0
npm update playwright npm update crawlee
~/.cache/ms-playwright/firefox-1403/
const crawler = new PlaywrightCrawler({ ... browserPoolOptions: { useFingerprints: true, fingerprintOptions: { fingerprintGeneratorOptions: { browsers: ['firefox'], operatingSystems: ['linux'], }, }, }, launchContext: { launcher: firefox }, });
useFingerprints: true
, useFingerprintCache: false
, launcher: firefox
pluginContent
string) taken from here: https://discord.com/channels/801163717915574323/1059483872271798333preNavigationHooks: [ async ({ page, request }) => { await page.addInitScript({ content: pluginContent }); },
crawlee/core 3.3.1 playwright 1.33.0 npm: 8.19.3 node: 16.19.0
npm update playwright npm update crawlee
~/.cache/ms-playwright/firefox-1403/
const crawler = new PlaywrightCrawler({ ... browserPoolOptions: { useFingerprints: true, fingerprintOptions: { fingerprintGeneratorOptions: { browsers: ['firefox'], operatingSystems: ['linux'], }, }, }, launchContext: { launcher: firefox }, });
import { firefox } from 'playwright-extra'; import stealthPlugin from 'puppeteer-extra-plugin-stealth'; firefox.use(stealthPlugin());
useFingerprints: true
and launcher: firefox
in code.INFO PlaywrightCrawler: Starting the crawler. An error occured while executing "onPageCreated" in plugin "stealth/evasions/user-agent-override": TypeError: Cannot read properties of undefined (reading 'userAgent') at Proxy.<anonymous> (.../node_modules/playwright-extra/src/puppeteer-compatiblity-shim/index.ts:217:23) at runNextTicks (node:internal/process/task_queues:61:5) at processImmediate (node:internal/timers:437:9) at process.topLevelDomainCallback (node:domain:161:15) at process.callbackTrampoline (node:internal/async_hooks:128:24) at async Plugin.onPageCreated (.../node_modules/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js:69:8)
import { firefox } from 'playwright-extra'; import stealthPlugin from 'puppeteer-extra-plugin-stealth'; firefox.use(stealthPlugin());
useFingerprints: true
and launcher: firefox
in code.INFO PlaywrightCrawler: Starting the crawler. An error occured while executing "onPageCreated" in plugin "stealth/evasions/user-agent-override": TypeError: Cannot read properties of undefined (reading 'userAgent') at Proxy.<anonymous> (.../node_modules/playwright-extra/src/puppeteer-compatiblity-shim/index.ts:217:23) at runNextTicks (node:internal/process/task_queues:61:5) at processImmediate (node:internal/timers:437:9) at process.topLevelDomainCallback (node:domain:161:15) at process.callbackTrampoline (node:internal/async_hooks:128:24) at async Plugin.onPageCreated (.../node_modules/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js:69:8)
retireBrowserAfterPageCount=2
in browserPoolOptions
: this gives a unique fingerprint every two requests, which... isn't perfect (and starting a new browser instance so often looks strange)retireBrowserAfterPageCount=2
in browserPoolOptions
: this gives a unique fingerprint every two requests, which... isn't perfect (and starting a new browser instance so often looks strange)If set to true, the crawler will automatically try to bypass any detected bot protection.
Currently supports:
Cloudflare Bot Management
Google Search Rate Limiting
maxRequestRetries=0
- is it OK to use retryOnBlocked
in such case?If set to true, the crawler will automatically try to bypass any detected bot protection.
Currently supports:
Cloudflare Bot Management
Google Search Rate Limiting
maxRequestRetries=0
- is it OK to use retryOnBlocked
in such case?content = await page.content();
page.content: Target page, context or browser has been closed at (<somewhere-in-my-code>.js:170:54) at PlaywrightCrawler.requestHandler (<somewhere-in-my-code>.js:596:15) at async wrap (.../node_modules/@apify/timeout/index.js:52:21)
page.content()
?content = await page.content();
page.content: Target page, context or browser has been closed at (<somewhere-in-my-code>.js:170:54) at PlaywrightCrawler.requestHandler (<somewhere-in-my-code>.js:596:15) at async wrap (.../node_modules/@apify/timeout/index.js:52:21)
page.content()
?useSessionPool: false
and persistCookiesPerSession: false
launcher
and in fingerprintGeneratorOptions
browsers
fingerprintGeneratorOptions
devices: ['desktop']
launchContext: { useIncognitoPages: true }
preNavigationHooks
to fix the "plugin length" problem, as described here: https://discord.com/channels/801163717915574323/1059483872271798333useSessionPool: false
and persistCookiesPerSession: false
launcher
and in fingerprintGeneratorOptions
browsers
fingerprintGeneratorOptions
devices: ['desktop']
launchContext: { useIncognitoPages: true }
preNavigationHooks
to fix the "plugin length" problem, as described here: https://discord.com/channels/801163717915574323/1059483872271798333INFO Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,
... new PlaywrightCrawler({ autoscaledPoolOptions: { loggingIntervalSecs: null,
INFO Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,
... new PlaywrightCrawler({ autoscaledPoolOptions: { loggingIntervalSecs: null,
https://www.googletagmanager.com/gtag/js?id=... https://connect.facebook.net/en_US/fbevents.js https://www.google-analytics.com/analytics.js https://fonts.googleapis.com/css?family=Lato
https://www.googletagmanager.com/gtag/js?id=... https://connect.facebook.net/en_US/fbevents.js https://www.google-analytics.com/analytics.js https://fonts.googleapis.com/css?family=Lato