wflanagan

·

How to "store" and "retrieve" a browser on a per user basis?

I am working on a Crawlee-based crawler that performs actions on a "per user" basis. Given this, i want to keep a configuration of the browser on a per user basis. I already store cookies, and push those into the browser before each page is loaded on behalf of the user. But, this is done in the same browser.. and I think this is throwing things off.

I run into problems where the cookies “go bad”. This could have NOTHING to do with the architecture. But, it seems to me that every browser is “different”, and as such I think that might be throwing an error.

Does anyone have any thoughts on how to store on a per user basis? We also configure the proxy to use a sticky proxy/IP if we can for each user as well.

Thanks!

13 comments

w

v

A

wwflanagan

·

How to instantiate 1 crawler, and run it with incoming incoming requests

So, I'm trying to understand what I'm doing architecturally wrong. I define a "crawler" using Playwright crawler. Then, I add some urls to the requestQueue and it runs fine. I can then load more into the crawler, and again call run and it works. But, in the case where my crawler will have a queue that is sending it new requests ongoing, i'm not sure how to architect this.

When you make it where each crawler receive a url, processes it, and closes down, it makes it impossible effectively to run things in parallel.

When you try to add things to the queue while the crawler is running (using addRequests), that seems to fail as well.

So, how do I architect this?

This is my example code for reference.

(I attempted to add the example code i'm using, but it was too long. So, here's a gist: )

38 comments

2

w

A

H

g

G

Apify Discord Mirror

How to "store" and "retrieve" a browser on a per user basis?

How to instantiate 1 crawler, and run it with incoming incoming requests