How to retry when hit with 429

At a glance

The community member is using the BeautifulSoupCrawler from the crawlee-python library, and they are encountering an issue where 429 (Too Many Requests) errors are not being retried. Other community members have asked for more information, such as the code configuration and logs, to help diagnose the issue.

The community members have discussed that the crawlee-python library does not automatically retry requests in the 400-499 range, including 403 and 429 errors. As a workaround, the community member has used the ignore_http_error_status_codes option and handled the retries in the request handler.

An issue has been created on the crawlee-python repository, and the community members have noted that the upcoming v0.5.0 release will include the ability to retry 403 and 429 errors using the additional_http_error_status_codes option.

Useful resources

SShine

When using crawlee-js its working fine, but when using python 429 is not getting retried. Is there anything I am missing
I am using BeautifulSoupCrawler. Please help.

12 comments

OOleg V.

Is it still an issue ? Can you please provide short code reproduction, so we can check it ?

SShine

hi sorry for the delayed reply, yes when we get any errors in 429, 403 and anything in 400 range its not retrying

MMantisus

Hi, could you please show a code sample, I'm wondering how you configure the crowler (max_request_retries and max_session_rotations) and do you handle the cases of getting an error somehow additionally?

It is possible that when you get a 429 response, a re-request is executed, but it happens too fast and all re-requests get 429 error status too?

The 403 response status signals that you have received an access lock. I don't think a re-request should be performed in this case, more like a session change.

400 usually signals that the request is invalid, I don't think such requests should be repeated.

In general it seems to me that if you are encountering 429, you should adjust ConcurrencySettings to reduce the aggressiveness of scraping.
Also, what http client are you using?

OOleg V.

@Shine
Yes, please share some code to reproduce the issue, including the configuration of your scraper.
Also, provide logs or proof showing that your requests are not being retried in case of a 429 response.

Without this information, it’s difficult to assist, as your case seems quite unusual. By default, such requests should be retried automatically.

SShine

Hi the below is the code

Plain Text

from apify import Actor, Request
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from .routes import router
from crawlee import ConcurrencySettings

async def main() -> None:
    concurrency_settings = ConcurrencySettings(
        max_concurrency=3,
    )
    async with Actor:
    
        # Create a crawler.
        crawler = BeautifulSoupCrawler(
            request_handler=router,
            max_requests_per_crawl=100,
            max_request_retries=10,
             concurrency_settings=concurrency_settings
         )

        # Run the crawler with the starting requests.
        await crawler.run(['https://example.com'])

SShine

Log

SShine

if there is 403 error when we try again then its accessible so I want to do retry for this status code

MMantisus

Yes, it looks like you can't call repeats for status code in the 400-499 range

https://github.com/apify/crawlee-python/blob/master/src/crawlee/basic_crawler/_basic_crawler.py#L653

I don't think it's supposed to work that way

SShine

for now what I did is used

Plain Text

ignore_http_error_status_codes=[403]

and in request handler it will have error in elements so retry works from there

MMantisus

I created an Issue on this problem - https://github.com/apify/crawlee-python/issues/756

I'll post here when it's resolved.

MMantisus

Once v.0.5.0 is released, you will be able to invoke retries for 403 or 429 with additional_http_error_status_codes.

See PR https://github.com/apify/crawlee-python/pull/812.

SShine

thank you for the update

Add a reply

Apify Discord Mirror

How to retry when hit with 429