PlaywrightCrawler exception: page.content: Target page,...

This exception happens in about 15-20% of all requests... quite often!

This line in code:

Plain Text

content = await page.content();

Throws this exception

Plain Text

page.content: Target page, context or browser has been closed
   at (<somewhere-in-my-code>.js:170:54)
   at PlaywrightCrawler.requestHandler (<somewhere-in-my-code>.js:596:15)
   at async wrap (.../node_modules/@apify/timeout/index.js:52:21)

Is it something well known?

Should I check(wait for) something before calling page.content() ?
It is already checked that status of response.status() is less than 400 (it is actually 200, i see it in the logs)

7 comments

LLukas Krivka

This is not really connected to Crawlee, it is just a situation that either the page was closed or it crashed. You can try/catch that but at that point the requests is useless anyway.

I would try headful mode and see why it crashed. Sometimes it could be memory overload.

nnew_in_town

I have that "catch" block, and I see that content is empty in case this exception thrown... so the catch block is only to prevent crash of my program - this request is already useless, you are right.

Regarding "headful" mode... well, only some requests crash! from my point of view "headful" mode will not be helpful.

I would focus on logs.

page was closed or it crashed.

As far as I understand it - this "page" is something in other process, I mean it is in browser. In the JS/Node/Playwright world we have only some... handle/connection to browser+page. Is it correct?

I am sure such thing as headless Firefox (i use only Firefox at the moment) can write log files. How to enable/setup headless Firefox logging? Where to look?

LLukas Krivka

Yes, page is a handle to a tab in a browser. So if the underlying page (tab) or browser closes/crashes, Playwright cannot do anything about it and will throw this erorr.

I think there are 2 possible cases:

You accidentally close the page somewhere or you are missing some await and the handler is already done when you call your code.
The browser/tab just crashed which can happen also on your laptop if it runs out of memory. Most cases of crash are because of that and it is fine to let the request just retry.

nnew_in_town

Well, it seems this error: page.content: Target page, context or browser has been closed is related to this setting:

Plain Text

    browserPoolOptions: {
        retireBrowserAfterPageCount: 3,
       ...
   }

I changed retireBrowserAfterPageCount to 100 -> and the error disappeared.

I should test a bit more to be 100% sure...

Anyway... I thought Crawlee/Playwright knew about these dependencies -
"this browser instance still processing request(s) - lets wait and retire it later - when all requests completely finished". it turns out that's not true.

LLukas Krivka

It should be true and retire only after pages were processed. you would need to share your code

nnew_in_town

Well, I found a solution for this!

With these settings I saw many "Target page, context or browser has been closed" errors:

Plain Text

    browserPoolOptions: {
         operationTimeoutSecs: 40,
         retireBrowserAfterPageCount: 3,
         maxOpenPagesPerBrowser: 3,
         closeInactiveBrowserAfterSecs: 30,

Than I changed a few settings... and no errors any more:

Plain Text

    browserPoolOptions: {
         operationTimeoutSecs: 30,
         retireBrowserAfterPageCount: 2,
         maxOpenPagesPerBrowser: 10,
         closeInactiveBrowserAfterSecs: 200,

Probably the low value of closeInactiveBrowserAfterSecs (30) caused these errors, but I am not sure.

The very low value of retireBrowserAfterPageCount is needed to change browser fingerprint often: my goal is to have
unique fingerprints per request. With retireBrowserAfterPageCount=2 I have unique fingerprint every two requests, which isn't perfect, but it's not bad.

By the way, we discussed this "New fingerprint per new page in browser-pool" thing in the past (and still no good solution for use with PlaywrightCrawler as far as I understand... but this should be discussed separately):
https://discord.com/channels/801163717915574323/1060467542616965150/1062991696813625405

LLukas Krivka

Hmm, interesting, I have never changed the default value of closeInactiveBrowserAfterSecs.

The new fingerprint per page (we call it a session which is fingerprint + proxy IP) is just experimental hack https://crawlee.dev/api/browser-pool/class/LaunchContext#experimentalContainers

Add a reply

Join on Discord

Apify and Crawlee Official Forum

PlaywrightCrawler exception: page.content: Target page, context or browser has been closed