Apify and Crawlee Official Forum

Updated 4 months ago

websocket error during Apify run that is not from our code

In the middle of a Python Playwright run, we are getting this error:
Plain Text
ERROR Error in websocket connection
Traceback (most recent call last):
         File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1301, in close_connection
           await self.transfer_data_task
         File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 974, in transfer_data
           await asyncio.shield(self._put_message_waiter)
       asyncio.exceptions.CancelledError

       The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
         File "/usr/local/lib/python3.11/site-packages/apify/event_manager.py", line 222, in _process_platform_messages
           async for message in websocket:
         File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 498, in __aiter__
           yield await self.recv()
                 ^^^^^^^^^^^^^^^^^
         File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 568, in recv
           await self.ensure_open()
        File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
           raise self.connection_closed_exc()
       websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received


We do not have any websocket code in our Actor. The traceback does not have any reference to our code.
2
g
S
V
7 comments
Our Actor code is

Plain Text
async with Actor:
        actor_input = await Actor.get_input() or {}
        proxy_settings = actor_input.get('proxies',None)
        proxy_configuration = await Actor.create_proxy_configuration(actor_proxy_input=proxy_settings)
        proxy = await proxy_configuration.new_proxy_info('session0')
        # Launch Playwright an open a new browser context
        Actor.log.info('Launching Playwright...')
        async with async_playwright() as playwright:
            server = proxy['url'][proxy['url'].find('@')+1:]
            print("Determined server is %s" % (server))
            proxy_block = {
                "server": server,
                "username": proxy['username'],
                "password": proxy['password']
            }
            browser = await playwright.chromium.launch(headless=Actor.config.headless, proxy=proxy_block)
            context = await browser.new_context()
            tasks = []

            task = asyncio.create_task(worker('worker-0', context))
            tasks.append(task)
            # Wait until all worker tasks finish or one throws an exception
            try:
                await asyncio.gather(*tasks)
            except Exception as e:
                raise e
            print("All done ")


The worker() task is our crawler that opens a page in playwright and parses a small piece out. We're not sure where, if anywhere, a websocket is running. Since the code is coming from apify/event_manager.py we think it's some Apify global code that's running, but we can't find it.
Our team will reply you soon! cc
Hi, could you please provide a full reproducible code sample? (what is async_playwright and worker?) Is it happening both locally and on the platform?
I'll make my assumptions.
async_playwright is a basic input point to asynchronous playwright, you can check it by documentation - https://playwright.dev/python/docs/api/class-playwright.

Judging by the worker code, this is a coroutine that works with the received context and executes a script for web scraping.

I am somewhat confused by this code section.

Plain Text
tasks = []

task = asyncio.create_task(worker('worker-0', context))
tasks.append(task)


Is it true that there is always only one task in your tasks? Or did you change the code when copying it here?

Also, I don't see the browser closing anywhere.

Plain Text
await browser.close()

Since the error may occur if the browser was not closed correctly.
There is just one task per Actor. We ported this from another Actor we maintain that does multiple. But since this boots a browser, we keep it to one task.
just advanced to level 3! Thanks for your contributions! πŸŽ‰
from playwright.async_api import async_playwright is that.
The worker code is simply:
Plain Text
context = await browser.new_context(proxy=[we give it a new proxy])
page = await context.new_page()
await page.goto(url)
content = await page.content()
# do stuff with content

I did a big run last night and the websocket error has not returned as far as I can see. I'm wondering if it was something spurious. Do you have any context you can give on where _process_platform_messages is supposed to run and why it would time out ?
Add a reply
Sign up and join the conversation on Discord