Extracting text from list elements

At a glance

The community member is trying to extract the text from all li elements inside an unordered list ul. They encountered an error using the locator() method in Playwright, which returned multiple elements. The community member found that Playwright itself doesn't have an issue with selectors that return multiple elements, and they also mentioned the strictSelectors parameter in the Crawlee docs, but couldn't get it to work.

In the comments, another community member suggested using the $$eval() method in Playwright to extract the text from the list items, which the original poster confirmed worked for them.

Useful resources

JJerome

I want to extract the text from all <li> elements inside an unordered list <ul>.
Trying await page.locator("div.my_class > ul > li").textContent(); causes an error: strict mode violation: locator('div.my_class > ul > li') resolved to x elements. The presence of multiple elements is expected since this is a list.
Playwright itself doesn't seem to have an issue with selectors that return multiple elements, and I did find the strictSelectors parameter in the crawlee docs, but didn't manage to set it to false (if that is even the solution).
In scrapy item.add_css("list", "div.my_class > ul > li::text") returns a list of the text for each list item, which is what I'm looking for.
Does anyone know how to solve this?

2 comments

HHonzaS

you can try to use crawlee function https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#parseWithCheerio
and then extract it with cheerio functions
or you can use
https://playwright.dev/docs/api/class-page#page-eval-on-selector-all
await page.$$eval('div.my_class > ul > li', (els)=>els.map((x)=>x.textContent))
writing it from my head so not sure it is exactly right, but something like this should work

or as is written in the docs you can try the same with
https://playwright.dev/docs/api/class-locator#locator-evaluate-all

JJerome

Thanks , using $$eval works:

Plain Text

const list_text = await page.$$eval("div.my_class > ul > li", (els) => {
    return els.map((el) => el.textContent);
});

Add a reply

Apify Discord Mirror

Extracting text from list elements