Shine

How to retry when hit with 429

When using crawlee-js its working fine, but when using python 429 is not getting retried. Is there anything I am missing
I am using BeautifulSoupCrawler. Please help.

12 comments

SShine

Playwright increase timeout

While using playwright with proxies sometimes the page is taking more time to load, so how can I increase the load time.

Plain Text

Page.goto: Timeout 30000ms exceeded

27 comments

SShine

Proxy configuration am I doing it wrong?

Plain Text

async def main() -> None:
    """The crawler entry point."""
    proxy_configuration = ProxyConfiguration(

        proxy_urls=[
            'https://xxx:xx/',
            'https://xxx:xx/',
        ]
    )
    crawler = PlaywrightCrawler(
        headless=False,
        request_handler=router,
        max_requests_per_crawl=50,
        proxy_configuration=proxy_configuration,

    )

I am using playwrightcrawler and when I use this code its showing error ->

Plain Text

 playwright._impl._api_types.Error: Browser needs to be launched with the global proxy. If all contexts override the proxy, global proxy will be never used and can be any string, for example "launch({ proxy: { server: 'http://per-context' } })"

4 comments

SShine

Stop Crawlee When Condition Met

I am trying to scrape an ecommerce site and would like to scrape only 20 items. How can I stop the process when this many items are scraped.

2 comments

Apify Discord Mirror

How to retry when hit with 429

Playwright increase timeout

Proxy configuration am I doing it wrong?

Stop Crawlee When Condition Met