Also, related to this, in our current/old system, we would open tabs for threaded execution. I'm not sure if I need todo something similar with Crawlee? And, given this, I'm not sure how this works.
Regarding the "threaded" execution, Crawlee handles per-request concurrency automatically, so you don't really have to care for it (it scales up and down based on the current system load).
so, it basically works with playwright's options for this?
Yep, launchContext.userDataDir
is just passed to Playwright afaik. You can pass more launch options to the browser (like CLI arguments) in launchContext.launchOptions
(check out the TS type annotation in your IDE, it gives you all the options you can use)
just advanced to level 1! Thanks for your contributions! 🎉
so, we're running in Kubernetes.. wiht multiple worker processes.. so i'd need to mount these standard data directory into all of my workers.. so it could get to the correct path.
we're already stuffing the browser before retrieve with cookies, and dumping them back after the page is loaded
this would be the data directory that would store "other" stuff.. i'd guess. that would help us keep things "clean" between users.
I don't think we ever tried anything like this, but yes - in theory, it should work like this 🙂
If you keep the mapping "one user = one userDataDir", you might even save yourself the hassle with injecting the cookies - the cookies are saved in the userDataDir
(along with localStorage
contents etc.) This also shows why you definitely shouldn't share the same userDataDir
between multiple users 🙂 If you don't specify this option, Playwright generates a new ephemeral userDataDir
for each script execution iirc.
thanks for that.. yeah, we have initial cookies we'd need to inject. but otherwise, yeah, that seems like it would be logical.