Apify and Crawlee Official Forum

Updated 4 weeks ago

Strangely, the block rate of Apify IPs is high on websites within Korea.

Hello, I am currently using Apify in Korea to utilize LLM queries.

However, it seems that Apify has been almost entirely blocked by major services in Korea recently.

For example, it appears to be blocked on sites like the following:
I am using various proxies simultaneously for LLM, but Apify is particularly detected at a very high rate of over 97% on Korean shopping malls.
And I am using RESIDENTIAL Proxy, mainly with the countryCode set to KR. Also, I am not using Puppeteer at all and am precisely calling request replays.
O
1 comment
Here are some suggestions to troubleshoot and improve the situation:

  1. Proxy Rotation and Quality:
    • Ensure you’re using high-quality Residential proxies. Sometimes, even Residential proxies can be flagged if they are overused or come from known data centers.
    • Rotate proxies more frequently to avoid triggering detection mechanisms.
    • Experiment with a mix of IPs from nearby countries instead of exclusively KR (e.g., JP, CN), which might bypass local restrictions.
  1. Request Headers and Payload:
    • Double-check your headers and payloads. Mimic real browser requests by including accurate user-agent strings, cookies, and other headers.
    • Tools like Apify Fingerprint Suite can help generate realistic browser fingerprints.
  1. Request Throttling and Timing:
    • Reduce the frequency of requests to avoid looking like a bot. Implement random delays between requests.
    • Try spreading requests over time to reduce suspicion.
  1. Alternative Scraping Techniques:
    • Use headless browsers like Puppeteer or Playwright for challenging sites. Even though you’re not currently using them, these tools can mimic user behavior effectively and bypass detection in many cases.
    • Implement CAPTCHA-solving services if CAPTCHA blocks are common.
  1. Evaluate Target Sites:
    • Some platforms (like the ones you mentioned) may have advanced bot-detection mechanisms like Akamai or Cloudflare. Scraping these may require tailored solutions or specialized proxies.
Also this section in docs might be useful:
https://docs.apify.com/academy/anti-scraping

In general, if you're having trouble bypassing protection with HTTP requests, consider using Playwright with a Firefox browser configuration. Pairing it with Residential proxies can make your scraper more difficult to detect, though it may increase resource consumption and costs.
Add a reply
Sign up and join the conversation on Discord