Apify Discord Mirror

Updated 2 years ago

Captcha detection?

At a glance

The community member is looking for a generic way to detect captcha in web pages, as they have encountered a page that returns a 200 HTTP status but still displays a captcha. They have noticed a meta tag in the HTML that indicates a captcha challenge. The community member is interested in using a function in Playwright/Crawlee to detect captchas, rather than relying on HTTP status codes.

In the comments, another community member suggests trying a solution from the store-website-checker project, which may be moved to Crawlee in the future. However, the community member notes that these types of solutions can be unstable as the captcha detection methods may change over time.

Useful resources
How to detect captcha?

I see this in the response HTML:
Plain Text
<head>
  ...
  <meta name="captcha-challenge" content="1">
  ... 

but I would prefer to use some function in Playwright/Crawlee.
I mean, some generic way to detect captcha - who knows which variant of captha I will get in the future....

I can not use HTTP status - this page returns status=200 but it shows captcha!
L
1 comment
You can try this - https://github.com/apify-projects/store-website-checker/blob/master/checker-cheerio/src/checkers.ts
We might move it to Crawlee but these will always be a bit unstable because it can change
Add a reply
Sign up and join the conversation on Discord