Apify and Crawlee Official Forum

Home
Members
Louis Deconinck
L
Louis Deconinck
Offline, last seen 2 months ago
Joined September 25, 2024
I am getting this error message, how to best deal with it?
Reclaiming failed request back to the list or queue. Redirected 10 times. Aborting.
Can I increase the max number of redirects for my CheerioCrawler?
1 comment
O
The site I'm scraping uses fingerprint.com bot protection. Locally my code passes the protection 95% of the time, but when running the actor on Apify it never does. How is that possible?

To pass this protection I've implemented the following measures (complete code in next message), this was a bit of trial and error, so all feedback welcome:
  • Browser Configuration
    • Using Firefox instead of Chrome/Chromium
    • Using incognito pages (useIncognitoPages: true)
    • Enabled fingerprint randomization (useFingerprints: true)
  • Random Viewport/Screen Properties
    • Random window dimensions (1280-1920 x 720-1080)
    • Random device scale factor (1, 1.25, 1.5, or 2)
    • Random mobile/touch settings
    • Random color scheme (light/dark)
  • Locale and Timezone Randomization
    • Random locale from 8 different options
    • Random timezone from 8 different global locations
  • Browser Property Spoofing
    • Removing navigator.webdriver flag
    • Random navigator.plugins array
    • Random navigator.platform
    • Random navigator.hardwareConcurrency (4-16)
    • Random navigator.deviceMemory (2-16GB)
    • Random navigator.languages
    • Random navigator.maxTouchPoints
  • Chrome Detection Evasion
    • Removing Chrome DevTools Protocol (CDP) detection properties (cdcadoQpoasnfa76pfcZLmcfl*)
  • Performance Timing Randomization
    • Modifying performance.getEntries() to add random timing offsets
    • Randomizing both startTime and duration of performance entries
  • Proxy Usage
    • Using residential proxies (groups: ['residential'])
6 comments
L
M
I introduced a new version of my actor, but how do I make it the latest version? I assume that the README is also taken from the latest version?
I've developed an actor which I would like to publish as it is finished. However, in order to scrape sufficient data, proxies would be necessary. How does this work and who pays for the proxies when the actor is published? I'm currently doing local development on a free account.
1 comment
R