Apify and Crawlee Official Forum

Updated 4 months ago

Sessions and proxies?

I am having a hard time understanding sessions and proxies. I have the following crawler setup:

Plain Text
const crawler = new PuppeteerCrawler({
    requestList,
    useSessionPool: true,
    persistCookiesPerSession: true,
    proxyConfiguration,
    requestHandler: router,
    requestHandlerTimeoutSecs: 100,
    headless: false,
    minConcurrency: 20,
    maxConcurrency: 30,
    launchContext: {
        launcher: PuppeteerExtra,
        useIncognitoPages: true
    },
})


Basically I want to run the same task concurrently with different proxies. Unless I set useIncognitoPages: true, only one session is used concurrently with one proxy. Is this how it should work? What is the point of having a session pool if only one is used?
A
J
A
5 comments
Session logic is to stick with IP (proxy) until error and keep cookies as long as session "alive", so if you want random IPs per each request do not use session, if you need cookies handle it by own logic
So with concurrency, Crawlee uses the same session in parallel in case I use sessionPool?
Regarding manually handling cookies and stuff, probably easier to set useIncognitoPages: true. That way each page has its own proxy and everything is handled.
How would I use random proxies without useSessionPool? With the below config Puppeteer is running on the same proxy.
Plain Text
const crawler = new PuppeteerCrawler({
    // useSessionPool: true,
    requestHandler: router,
    maxConcurrency: 2,
    headless: false,
    proxyConfiguration,
    requestList,
})

await crawler.run()


And pages also share cookies.
just advanced to level 1! Thanks for your contributions! πŸŽ‰
If you need random access then expected way is useSessionPool: false, persistCookiesPerSession: false otherwise I not sure how exactly it will end up with some other session settings and incognito pages, may be SDK will enforce cookies, may be not, never tried this way actually πŸ˜‰ To check in more details you can add some log output based on context.session
Add a reply
Sign up and join the conversation on Discord