Error Target page, context or browser closed

Hello fellow developers,

I'm facing a consistent issue with Playwright in the Crawlee library context. Every time I perform an async operation on a locator instance, the page unexpectedly closes.

Here's the simplified code where the issue is evident:

Plain Text

const doesContainAllParts: AlertsProximityAnalyzerComparator<
  Frame | Page
> = async (element) => {
  try {
    const test = element.locator('body');
    const result = await test.count();  // Page closes unexpectedly here

    return result > 0;
  } catch (error) {
    console.error('Error in doesContainAllParts:', error);
    throw error;
  }
};

The issue specifically happens at the line const result = await test.count(). Each time this line executes, the page closes, leading to the failure of the operation.

Some key points:

The problem consistently occurs every time this code is executed.
I'm using the latest versions of Playwright and Crawlee.
The issue seems to be tied to the await operation on the locator instance.

I'm stumped as to why this is happening. Is this a known issue with Playwright or Crawlee, or could there be something wrong with my implementation? Any insights, suggestions, or similar experiences would be incredibly helpful.

Thanks a lot in advance for any assistance!

PS I'm adding a video with settings headless: false to show you how it looks

PSS And here is disscussion on github with more details: https://github.com/apify/crawlee/discussions/2185

10 comments

AApifyBot

just advanced to level 1! Thanks for your contributions! 🎉

could someone let me know what could be the problem here?

PPepa J

Hi , hard to say on this one, what is the value for the element variable there? I dont see it in the debugger variables toolbar, is it just regular page object?

I cralw on Frames object

PPepa J

so maybe page.frameLocator is what you want to use? https://playwright.dev/docs/frames

I tired that and it did not work, As far as I testing I found out that the problem is probably with resoling my promise to early, becasue once I do operations like await page.title() await page.content, etc inside requestHandler everything works fine, but my logic looks different:

Plain Text

  private requestHandler: PlaywrightRequestHandler = async ({
    page,
    request,
    log,
  }) => {
    log.info(`Request to: ${request.url} ...`)
      await page.waitForLoadState('domcontentloaded')
      const title = await page.title() // works fine
      const content = await page.content() // works fine
      // every other logic I paste here works fine, but i cannot paste it here because of my buissness logic and other dynamic data I provide here.

      await playwrightUtils.infiniteScroll(page, {
        scrollDownAndUp: true,
        waitForSecs: 2,
        timeoutSecs: 5,
      })
      this.resolvePromise(request.url, page)
  }

====
 private resolvePromise(url: string, result: Page): void {
    if (this.urlToPromiseResolver[url]) {
      this.urlToPromiseResolver[url].resolvePromise(result)
      delete this.urlToPromiseResolver[url]
    }
  }
====
  public resolve = async (urls: string[]): Promise<Page[]> => {
    const urlsWithUniqueKeys = urls.map((url) => ({
      url,
      uniqueKey: `${url}_${Math.random()}`,
    }))

    await this.crawler.addRequests(urlsWithUniqueKeys)

    const promises = urls.map((url) => {
      return new Promise<Page | null>((resolvePromise, rejectPromise) => {
        this.urlToPromiseResolver[url] = { resolvePromise, rejectPromise }
      })
    })

    const result = Promise.all(promises)
      .then(filterResults).cathc(...)

    return result
  }
==== Thats how I use it:
 const [page] = await playwrightCrawleePageResolver.resolve([url])

  const title = await page.title() //error

After debugging it it looks like page close after i call first await on the result from my resolve function

PPepa J

I am sorry I cannot really follow your code. The only element that by syntax may contain <body> element is <html> element - That is why I asked for the value of element parameter there. I already linked you official documentation on how to work with frames in Playwright, and unfortunately we don't know anything about the website, that you are scraping. I suggest you to check the link I have provided and maybe some further examples, to get the idea on how to scrape frames/iframes.

The problem lies with the way PlaywrightCrawler handles crawling (the page is open only within the scope of requestHandler callback) so I manage this problem by providing call back to my function
playwrightCrawleePageResolver.resolve([url], callback) and this callback is executed in requestHandler

PPepa J

I am sorry, I am not able to follow what you try to achieve.

PlaywrightCrawler handles crawling (the page is open only within the scope of requestHandler callback)

Yes it does, so you should keep all the logic withing the requestHandler, of course you may call your own functions and methods, just keep in mind, when calling async functions use the await keyword, otherwise the is no waiting for the results of your function and the page might get close before your function being evaluated.

I encountered a challenge with maintaining all operations within the requestHandler. This stems from the fact that our crawler is initialized just once, yet it needs to continuously crawl the ever-changing internet for data. This dynamic nature of data necessitated a custom approach, allowing the crawler to dynamically accept varying arguments and functions. The key takeaway is that I've successfully resolved this issue. Thank you for your time and assistance.

Add a reply

Join on Discord

Apify and Crawlee Official Forum

Error Target page, context or browser closed