Apify and Crawlee Official Forum

Updated 7 days ago

Parsel Crawler way too dank with request speed

At a glance

A community member is creating a crawler using Crawlee for Python and noticed that the Parsel crawler makes requests at a much higher frequency than the Beautiful Soup crawler. They are looking for a way to slow down the Parsel crawler to avoid getting blocked. Another community member suggests using asyncio.sleep(random.uniform(1, 3)) to delay the requests. The original poster did not try this solution but was curious about the reason for the difference in speed. A third community member explains that the difference in speed is likely due to the parsing library used, as Parsel is faster at parsing the pages. To slow down the Parsel crawler, the third community member recommends using ConcurrencySettings(max_tasks_per_minute=100) when creating the ParselCrawler instance.

Hi everyone! I am creating a crawler using crawlee for python. I noticed the Parsel crawler makes the requests in much higher frequency than the Beautiful soup crawler. Is there a way to make the Parsel crawler slower, so we avoid getting blocked better? Thanks!
Marked as solution
Hi, @Rigos

The reason for the difference in speed may be due to the speed of the parsing library used directly. Since this is a CPU bound task. If Parsel parses the page faster (which it does), then the overall speed of the crawler will be faster.

In order to slow down the crawler, I would recommend using:
Plain Text
from crawlee import ConcurrencySettings

crawler = ParselCrawler(concurrency_settings=ConcurrencySettings(max_tasks_per_minute=100))
View full solution
E
R
M
5 comments
Hi, did you try to use this solution to delay request
await asyncio.sleep(random.uniform(1, 3))
Hi, I did not. But I was also curious why is it happening. I would expect the actors to behave the same way.
Hi, @Rigos

The reason for the difference in speed may be due to the speed of the parsing library used directly. Since this is a CPU bound task. If Parsel parses the page faster (which it does), then the overall speed of the crawler will be faster.

In order to slow down the crawler, I would recommend using:
Plain Text
from crawlee import ConcurrencySettings

crawler = ParselCrawler(concurrency_settings=ConcurrencySettings(max_tasks_per_minute=100))
Using max_tasks_per_minute will give you better control over your parsing speed, especially if you encounter 429 blocking (too frequent requests to the server)
Thanks a lot!
Add a reply
Sign up and join the conversation on Discord