Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
Jourdelune
J
Jourdelune
Offline, last seen 4 weeks ago
Joined August 30, 2024
Hey, I created this simple script:
Plain Text
import asyncio

# Instead of BeautifulSoupCrawler let's use Playwright to be able to render JavaScript.
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
import urllib.parse


async def main(terms: str) -> None:
    crawler = PlaywrightCrawler(headless=False)

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        # Wait for the collection cards to render on the page. This ensures that
        # the elements we want to interact with are present in the DOM.
        await context.page.wait_for_load_state("networkidle")
        await context.infinite_scroll()

    url = f"https://www.youtube.com/results?search_query={urllib.parse.quote(terms)}&sp=EgIwAQ%253D%253D"
    await crawler.run([url])


if __name__ == "__main__":
    asyncio.run(main("music"))
`

But I want to get the content of the page while infinite_scroll scroll the page, like that I can see the new content and I can make action according to them, but await context.infinite_scroll() never stop so I can't put an action behind it to run the thinkg I want, how can I manage that? (I want to get the new link of youtube video)
2 comments
O
J
J
Jourdelune
·

Robots.txt

Hey, do you have any idea how to respect robots.txt? We must code that ourself?
5 comments
S
J
J
Jourdelune
·

Robots.txt

Hey, do you have any idea how to respect robots.txt? We must code that ourself?
5 comments
S
J