Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Request works in Postman but doesnt work with Cheerio Crawler, request object headers empty

Dear all, I am trying to scrap data from a public ip. For some reason cheeriocrawler is not getting the data back but in postman I could easily get the data. Proxy ip is whitelisted because I am using the same ip for postman and for cheerio.

Postman does add some default headers but when I look at my request object the headers are empty. Does someone knows at which points cheerio sets the headers and generate some fingerprints and how can I see them ?

Request { id: 'OBTRQI5zvA4aIJ9', url: 'https://someapi.com', loadedUrl: 'https://someapi.com', uniqueKey: '22586062-3f0d-40be-b499-f1a00261b5d3', method: 'GET', payload: undefined, noRetry: false, retryCount: 0, errorMessages: [], headers: {}, userData: [Getter/Setter], handledAt: undefined }


any help would be highly appreciated. Thanks
v
c
O
9 comments
Request is our structure which stores what URL to call, what HTTP method, and with what headers/payload to call it!

You probably want response.headers, where response comes from the context of the requestHandler function
Thanks for your response. Actually, I am more interested in what is being sent in the request headers. I have debugged it further and found out that when I try to scrap the API it won't work in the first try and when I refresh the opened browser by crawlee it does work. I wanted to check what is going on so I used Playwright in head full mode and I could see that there was an error but when I refreshed the same page I got the response back. The api I am trying to scrap data from is very sensitive to some headers as you see in the picture. I think some headers are not set properly in the request and on refresh the browser adds default headers and then it works.
Attachment
image.png
Oh those headeds
You can add them yourself!
When you enqueue the link, you can enqueue via an object with url and headers, and pass in any header you need on initial request
Still doesn't work. With the same proxy it works in ,in a simple browser but when I use it with crawlee it doesn't work.
do you have any idea ?
Did you testit with different proxy groups?
It wasn't related to proxy bur rather to cookies. Its solved now.
Add a reply
Sign up and join the conversation on Discord
Join