Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Pause concurrent requests ?

Hello,
I have the following issue, I have a website that I'm scraping and I need to login every 100-150 items.
The issue is, if I'm going with more than 1 concurrent requests when in needs to login it already has in progress requests, which will go wrong.
So I have a marker that I'm extracting to know when I need to login again.
I want to go with >1 concurrent requests and stop everything when that marker is found, do the login and then resume.
Could it be possible to achieve that ?
N
A
P
22 comments
There's no straight-forward way to do it, but you could have some variable, which you would check at the beginning of requestHandler, and e.g. if variable tells that the login is in progress - it would just wait and reload the page when the login is successful. Another option is to set concurrency to 1, throw err for every other page, proceed with only 1 page - login - increase concurrency again. Maybe something else - but these two options are the first that came into my mind
can I set concurrency from routes ?
I've tried to do crawler.maxConcurrency but it doesn't work
should be something like crawler.autoscaledPool.maxConcurrency - not sure 100%, need to double check, but I think the above option should be correct
yeah, should be correct
trying right now
Hello Have you been successful with your approache? May you share it with use if you were?

I think I tried to solve something something similiar in the past and the issue may be the shared cookies between tabs while using more than one concurrency in your implementation.
Plain Text
    launchContext: {
        useIncognitoPages: true,
    },

And also setting new/empty cookies in preNavigationHooks for each login.
It didn’t work as expected, I will also try the incognito pages
You may also it with combination with the:
Plain Text
       browserPoolOptions: {
            maxOpenPagesPerBrowser: 1,
        },
If I go incognito, I trigger a check that they have and I need to wait for 8 seconds
Are you talking about some protection like Cloudflare? Did this helped to solve the original issue? As long as we don't know about the website, we probably cannot help more. 🙂
elcorteingles.es/supermercado/
here is the website, I have to remain logged in to keep my location, otherwise they are sending you to a default location.
If you'll open the website or a product page in private browsing you'll see a loading circle that'll make you wait for 5-8 seconds
I believe this is related to the cookies, so what you need to do is to get the cookies after the waiting and before the login and remeber these, and then set them in preNavigationHooks , but there might be some additional magic like the server will generate you another sessionId and denny you to log out with the old one, but that is something you should try.

Also concurency of 10 with 8s login should be always faster than 10 requests in sequence where each of them has individual 8s login.
it doesn't work
I am sorry at this point I don't have enough information about what is happening in your browser, nor code to test.
I will drop here my main and routes
cannot see any of our suggestions being implemented 🙂
Have been tested, didn’t work, this is The implementation that works , slow but it works
Add a reply
Sign up and join the conversation on Discord
Join