Apify

Apify and Crawlee Official Forum

b
F
A
J
A

How to manually pass datasets, sessions, cookies, proxies between Requests?

It might be obvious but have not been able to figure this out, nor in the documentation nor in the forums.
I want to manually manage my datasets and session, but I want to make a Request use a session I have created and to pass on the dataset to the handler of the request.
I know I could pass on using the userData, or I could create it in a different file and simply import it, but these seem like the wrong approaches.
A
A
G
7 comments
For datasets - you could open e.g. several named datasets, and then just save depending on some condition to one or another. For sessionPool - you could also provide e.g. createSessionFunction of sessionPoolOptions - https://crawlee.dev/api/core/interface/SessionPoolOptions#createSessionFunction

You could also use BasicCrawler https://crawlee.dev/api/basic-crawler and explicitly call the request, mark session good/bad, etc

but I guess the main question is - what exactly are you trying to achieve?
dataset management should be separate logic imho, since its not related to the way how you making requests; cookies per raw request always in headers, so if you want to keep i.e. doing requests as logged user then find auth cookies and reuse them
Honestly, I just want to make sure I am using the authentication cookies with the same proxy in the same session. Since i don't know how exactly Crawlee handles the session, Request is called.
You are absolutely right, the database login make a lot more sense to be separate. I am worried about hitting the sever from diffident proxies which have the same auth cookies
"Having our cookies and other identifiers used only with a specific IP will reduce the chance of being blocked." https://crawlee.dev/docs/guides/session-management i was trying to do this,
The documentation is very good at explain how to use every class separately but doe snot provide example of how to use it in the crawler
in crawlers session is part of the crawlingContext . Also you could provide sessionPoolOptions, where you can specify the session options, createSesssionFunction, etc. You could even specify one session to make sure you're going with only 1 IP, or specify that session should be retired let's say after 500 request, etc. Do you need to use only 1 account and thus you want to use only one session all the time?
Add a reply
Sign up and join the conversation on Discord
Join