How to debug seemingly no html in crawled response (Che...

YYam

Duplicated a custom apify actor that was working great, didn't really change anything but a few selectors and pointed at a new site. Unfortunately the actor seems to exit "successfully" after the first url (only start url) is handled. None of my logging shows anything is in the html returned, and enqueuelinks ofc does nothing, yet cheerio beleives the page request responded successfully.

How would I approach debugging this situation? I've so far checked that $('body').html() returns empty string and attempted using RESIDENTIAL proxy in local geolocation to the website in case it was clever blocking but no success.

The url being scraped is https://www.tesco.com/groceries/en-GB/shop/health-and-beauty/shampoo/all?page=1&count=48

5 comments

HHonzaS

did you try $.html() ?

JJameEnder

Hello, for debugging I would advise to use a client that displays the responses, I use Insomnia personally. When I try to run the URL you provided, it actually doesn't return any HTML elements inside the body, just some meta properties and a script.

YYam

Thanks for the pointers

YYam

Seems to be some "advanced" anti-crawling going on

HHonzaS

Try the playwright, should work.

Add a reply

Apify and Crawlee Official Forum

How to debug seemingly no html in crawled response (CheerioCrawler)