Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Extracting text from list elements

I want to extract the text from all <li> elements inside an unordered list <ul>.
Trying await page.locator("div.my_class > ul > li").textContent(); causes an error: strict mode violation: locator('div.my_class > ul > li') resolved to x elements. The presence of multiple elements is expected since this is a list.
Playwright itself doesn't seem to have an issue with selectors that return multiple elements, and I did find the strictSelectors parameter in the crawlee docs, but didn't manage to set it to false (if that is even the solution).
In scrapy item.add_css("list", "div.my_class > ul > li::text") returns a list of the text for each list item, which is what I'm looking for.
Does anyone know how to solve this?
H
J
2 comments
you can try to use crawlee function https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#parseWithCheerio
and then extract it with cheerio functions
or you can use
https://playwright.dev/docs/api/class-page#page-eval-on-selector-all
await page.$$eval('div.my_class > ul > li', (els)=>els.map((x)=>x.textContent))
writing it from my head so not sure it is exactly right, but something like this should work

or as is written in the docs you can try the same with
https://playwright.dev/docs/api/class-locator#locator-evaluate-all
Thanks , using $$eval works:
Plain Text
const list_text = await page.$$eval("div.my_class > ul > li", (els) => {
    return els.map((el) => el.textContent);
});
Add a reply
Sign up and join the conversation on Discord
Join