Request works in Postman but doesnt work with Cheerio C...

ccurioussoul

Dear all, I am trying to scrap data from a public ip. For some reason cheeriocrawler is not getting the data back but in postman I could easily get the data. Proxy ip is whitelisted because I am using the same ip for postman and for cheerio.

Postman does add some default headers but when I look at my request object the headers are empty. Does someone knows at which points cheerio sets the headers and generate some fingerprints and how can I see them ?

Request {
  id: 'OBTRQI5zvA4aIJ9',
  url: 'https://someapi.com',
  loadedUrl: 'https://someapi.com',
  uniqueKey: '22586062-3f0d-40be-b499-f1a00261b5d3',
  method: 'GET',
  payload: undefined,
  noRetry: false,
  retryCount: 0,
  errorMessages: [],
  headers: {},
  userData: [Getter/Setter],
  handledAt: undefined
}

any help would be highly appreciated. Thanks

9 comments

vvladdy

Request is our structure which stores what URL to call, what HTTP method, and with what headers/payload to call it!

You probably want response.headers, where response comes from the context of the requestHandler function

ccurioussoul

Thanks for your response. Actually, I am more interested in what is being sent in the request headers. I have debugged it further and found out that when I try to scrap the API it won't work in the first try and when I refresh the opened browser by crawlee it does work. I wanted to check what is going on so I used Playwright in head full mode and I could see that there was an error but when I refreshed the same page I got the response back. The api I am trying to scrap data from is very sensitive to some headers as you see in the picture. I think some headers are not set properly in the request and on refresh the browser adds default headers and then it works.

Attachment