Apify Discord Mirror

Updated 12 months ago

How to inject storage state into crawler's page

At a glance

The community members are discussing how to inject a storageState into a crawler's page before the page is created, specifically using the storageState from a session. The original post includes a code snippet that attempts to do this using a prePageCreateHooks function, but the community members are unsure how to reproduce this using the storageState from the session.

The community members suggest that the issue may be related to accessing the browserPools context before it is created and before the newPage call is made, as that is the only way to inject the storageState in Playwright. They mention that modifying the finalPageOptions works only with useIncognitoPages: true, and that Playwright seems to ignore the options passed from the browserPool unless in incognito mode, which they consider a potential bug.

The community members discuss potential solutions, such as intercepting the browserPool calls to newContext() and injecting the options before the context is created, but they have not found a way to do this.

Useful resources
How can a storageState be injected into crawler's page before the page is created? Basically, how to reproduce this but using storageState from the session instead:
Plain Text
        prePageCreateHooks: [
            async (pageId, browserController, pageOptions) => {
                const storageState = await KeyValueStore.getValue('storageState');
                if (pageOptions && storageState)
                    pageOptions.storageState = storageState;
            },
        ],
A
T
s
8 comments
Hey ! Long time no see πŸ™‚ Could you elaborate more on what exactly are you trying to achieve? To be frank - I am not really sure what's happening there :/
Hey ! Indeed. I need to restore storageState from a session and don't know how to do that in a crawler... Ideally, I need access to browserPools context before it is created and before it calls newPage but haven't found a way. I want to pass options to the call to newPage which is the only way to inject the storageState in PW.
Another apify alumni, bring beers πŸ˜„
I was trying to hijack newPage too recently, and found this old discussion mentioning that modifying finalPageOptions works only with useIncognitoPages: true

And indeed, it works!
(didn't have time to investigate why it is, but that's the world we live in πŸ˜„ )
and the thread you pointed brings another apify alumni πŸ˜„
Cheers! PW ignores the options passed from browserPool unless in incognito mode. Not sure why, looks like a bug to me... the code there is quite messy. I think there should be a way to intercept the browserPool calls to newContext() and inject the options before it's created but didn't find any.
Do you know of any way?
Add a reply
Sign up and join the conversation on Discord