Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Custom headers

I have a suuuper secure website that I'm trying to scrape and now I want to try to use the sitemaps and use google.com as referer .
How can I put this header for all requests ?
P
N
9 comments
Hello ,
What crawler do you use cheerio/pupeteer?
Have you tried setting extra headers in postPageCreateHooks?

Plain Text
    // ...
    postPageCreateHooks: [async (page) => {
        await page.setExtraHTTPHeaders({
            referer: 'google.com'
        }) 
    }],
    // ...
no, this is in the crawler creation in main.js ?
Plain Text
ArgumentError: Did not expect property `postPageCreateHooks` to exist, got `async (page) => {
        await page.setExtraHTTPHeaders({
            referer: 'https://www.google.com'
        }) 
    }` in object `PuppeteerCrawlerOptions`
    at ow (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/ow/dist/index.js:36:24)
    at new PuppeteerCrawler (/run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/node_modules/@crawlee/puppeteer/internals/puppeteer-crawler.js:77:26)
    at file:///run/media/neonomade/work/technitool_scrapers/MSC_Puppeteer/src/main.js:22:17
    at ModuleJob.run (node:internal/modules/esm/module_job:194:25) {
  validationErrors: Map(1) {
    'PuppeteerCrawlerOptions' => Set(1) {
      'Did not expect property `postPageCreateHooks` to exist, got `async (page) => {\n' +
        '        await page.setExtraHTTPHeaders({\n' +
        "            referer: 'https://www.google.com'\n" +
        '        }) \n' +
        '    }` in object `PuppeteerCrawlerOptions`'
    }
  }
}

Node.js v18.15.0
Ah, I am sorry, possible should be:
Plain Text
const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    requestHandler: router,
    browserPoolOptions: {
        postPageCreateHooks: [async (page) => {
            await page.setExtraHTTPHeaders({
                referer: 'google.com'
            }) 
        }],
    },
});
Add a reply
Sign up and join the conversation on Discord
Join