Apify and Crawlee Official Forum

T
Tay
Offline, last seen 4 months ago
Joined August 30, 2024
Is there a way to use fingerprints with the Cheerio crawler? I need it to send Firefox headers. It's currently sending chromium ones

Plain Text
Host: localhost:8000
Connection: keep-alive
Content-Length: 0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Windows"
Sec-Ch-Ua: "Chromium";v="124", "Microsoft Edge";v="124", "Not-A.Brand";v="99"
6 comments
T
A
v
O
Cheerio is not able to persist cookies that are set in the session. I have persistCookiesPerSession: true and I also verify that the cookie is being saved in the session in the requestHandler. But when i print out the request headers the cookie header is not present. The session in preNavigationHooks also does not contain the cookies
Plain Text
const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 10,
    requestHandlerTimeoutSecs: 30,
    maxRequestRetries: 10,
    useSessionPool: true,
    persistCookiesPerSession: true,
    preNavigationHooks: [
        async ({ request, session }, gotOptions) => {
            gotOptions.useHeaderGenerator = true;
            gotOptions.headerGeneratorOptions = {
                browsers: [{ name: 'firefox', minVersion: 115, maxVersion: 115 }],
                devices: ['desktop'],
                operatingSystems: ['windows'],
                locales: ['en-US', 'en'],
            };
            console.log('START PRE HOOK');
            console.log(request.url);

            // THIS IS EMPTY ON SECOND REQUEST
            console.log(session?.getCookies(request.url));
            console.log(gotOptions.headers);
            console.log('END PRE HOOK');
        },
    ],
    requestHandler: async ({ response, request, session, log, addRequests }) => {
        const refresh = response.headers?.refresh;
        if (refresh && session) {
            console.log(response.request.options.headers);
            log.info(`Access queue detected, waiting for ${refresh} seconds...`);
            
            // Cookies are present here
            console.log(session.getCookies(request.url));
            await sleep((parseInt(refresh) - 1) * 1000);
            await addRequests([{ url: request.url, uniqueKey: new Date().toString() }]);
        }
    },
});
1 comment
v
Is it possible to rate limit based on a key? So basically it would only process 1 URL at a time per key.
2 comments
H
g