Apify and Crawlee Official Forum

Updated 3 months ago

infinite_scroll | how to get the updated page

Hey, I created this simple script:
Plain Text
import asyncio

# Instead of BeautifulSoupCrawler let's use Playwright to be able to render JavaScript.
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
import urllib.parse


async def main(terms: str) -> None:
    crawler = PlaywrightCrawler(headless=False)

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        # Wait for the collection cards to render on the page. This ensures that
        # the elements we want to interact with are present in the DOM.
        await context.page.wait_for_load_state("networkidle")
        await context.infinite_scroll()

    url = f"https://www.youtube.com/results?search_query={urllib.parse.quote(terms)}&sp=EgIwAQ%253D%253D"
    await crawler.run([url])


if __name__ == "__main__":
    asyncio.run(main("music"))
`

But I want to get the content of the page while infinite_scroll scroll the page, like that I can see the new content and I can make action according to them, but await context.infinite_scroll() never stop so I can't put an action behind it to run the thinkg I want, how can I manage that? (I want to get the new link of youtube video)
O
J
2 comments
In general, infinite scroll is not reliable, as it can be really infinite, what can lead to memory leaks.

On websites with lazy-loading pagination, if API scraping is a viable option, it is a much better approach due to reliability and performance.

AFAIK, youtube has API endpoints for scrolling results. Try to Check it in devTools.

Otherwise, I'd advice to create your own scrolling logic with processing new items an a loop.
E.q. here (it's JS, but logic remains the same):
https://docs.apify.com/academy/puppeteer-playwright/common-use-cases/paginating-through-results#lazy-loading-pagination
Thanks for the answer! I found a workaround (context.page.on("request", track_request)) but I will try to use the API endpoint for that
Add a reply
Sign up and join the conversation on Discord