Apify and Crawlee Official Forum

K
Kike
Offline, last seen 3 weeks ago
Joined October 24, 2024
For some reason, when I push data to user_data (str type in this case), when I get user_data variables in another handler I get different values.
In this case the error is on tab_number. When I push tab_number to user_data the values seems to be good (values ranged from 1 to 100). But when I get tab_number through tab_handler I get a different value.
For example, for values from 1 to 19, I get tab_number 1, instead of the correct tab_number: tab_number pushed to user_data: "19", tab_number requested from user_data: "1".
I cannot find the error. Here is the code:

@router.handler('tabs')
async def tabs_handler(context: PlaywrightCrawlingContext) -> None:

tab_id = context.request.user_data["tab_id"]

await context.page.wait_for_selector('#tabsver > ul > li > a')

tabs = await context.page.locator('#tabsver > ul > li > a').all()

for tab in tabs:
tab_name = await tab.text_content()
tab_number = tab_name.replace("Tab number ", "").strip()
if tab_name:
await context.enqueue_links(
selector = f'#tabsver > ul > li > a:has-text("{tab_name}")',
label = "tab",
user_data = {"tab_id": tab_id,
"tab_number": tab_number
},
)

@router.handler('tab')
async def tab_handler(context: PlaywrightCrawlingContext) -> None:

tab_id = context.request.user_data["tab_id"]
tab_number = context.request.user_data["tab_number"]
1 comment
O
I get the following error: The session has been lost.
1 comment
A
I have a list of results where I enqueue the link for each item. For each item, I need to crawl internal pages (tabs) and extract the data in tables and add the data to the same dict. I can extract the data from all the pages with router and enqueue_links but I am not able to gather all data in the same dict for each item. What is the best way to do it?
4 comments
M
K