Apify Discord Mirror

Home
Members
CupOfGeo
C
CupOfGeo
Offline, last seen last week
Joined December 23, 2024
getting this system overloading message just trying to scrape two urls. this check just keeps looping for almost 10 mins now. i set the cpu to 4 and memeory to 4gb but still getting this message. i know cloud runs dont like threads and background tasks is that the real issue? not sure wondering if anyone has run them on cloud run
Plain Text
[crawlee.events._event_manager] DEBUG LocalEventManager.on.listener_wrapper(): Awaiting listener task...
[crawlee.events._event_manager] DEBUG LocalEventManager.on.listener_wrapper(): Awaiting listener task...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
[crawlee._utils.system] DEBUG Calling get_memory_info()...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
1 comment
M
Hello so Ideally i would like to have a file for website im scraping (so ome will contain more than one handler per py file). Im thinking of what the best pattern for that is. I was just going from the docs and have router = Router[BeautifulSoupCrawlingContext]() as a global var in my routes.py but i would need to either pass that router around as a singleton into the different handler files or i would import the files into the one routes.py and then register the handers there which sounds better but then I have something like webpage_handler.py which has my handler_one(context) and handler_two(context) then i register them in routes with. Whitch is fine but doesn't look too pretty.
Plain Text
@router.handler("my_label")
async def handler(context: BeautifulSoupCrawlingContext) -> None:
    handler_one(context)
@router.handler("another_label")
async def handler_another_name(context: BeautifulSoupCrawlingContext) -> None:
    handler_two(context)



to be honest not super sure wondering if someone already has a nice pattern that works.
1 comment
f