Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Saving bandwith using PlaywrightCrawler: to block googletagmanager, google-analytics etc...

I already block images as described in [1] and this helps to save some bandwith.
Next step: looking at statistics in my proxy service I see a significant number of requests like these:

Plain Text
https://www.googletagmanager.com/gtag/js?id=...
https://connect.facebook.net/en_US/fbevents.js
https://www.google-analytics.com/analytics.js
https://fonts.googleapis.com/css?family=Lato


Can somebody show me an example of code blocking these domains? (better: to block all domains from a given list)

I assume it should be something in PlaywrightCrawler.preNavigationHooks, right?
Prerequisites: PlaywrightCrawler, Firefox as launcher (Chrome-specific hacks probably would not work)

(I'm not good at writing Javascript from scratch, so need some help)

[1] https://discord.com/channels/801163717915574323/1060986956961546320
L
1 comment
https://crawlee.dev/api/playwright-crawler/namespace/playwrightUtils#blockRequests but it is only available in Chromium.

For Firefox only you need to use the Playwright routing which is less optimized since it disables cache and that can backfire
https://playwright.dev/docs/api/class-page#page-route
Add a reply
Sign up and join the conversation on Discord
Join