Apify and Crawlee Official Forum

Updated 2 weeks ago

Goodbye Crawlee (migrated to Hero)

I migrated my scraping code from Crawlee to Hero (see https://github.com/ulixee/hero). It works. Everything that worked with Crawlee - works with Hero.

Why I migrated: can not handle the over-engineered Crawlee API more (and bugs related to this).
It was just too much APIs (different APIs!) for my simple case.
Hero has about 5 times simpler API.

In both cases (Crawlee and Hero) I am using only scraping library, no additional (cloud) services, no docker containers.

I am not manipulating DOM, not doing retries, not doing any complex things in Typescript. I am just accessing the URL (in some cases the URL1 and after this the URL2 to pretentd I'm normal user), grab the rendered HTML and that's it. All the HTML manipulations (extracting the data from the HTML) done in completely different program (written in different programming language, not in Typescript).
Re-try logic -> again, this is implemented in that different program.

I use beanstalkd (see https://github.com/beanstalkd/beanstalkd/) message queue between that "different program" and the scraper. So I just replaced the Crawlee-based-scraper with Hero-based-scraper without touching other parts of the system. Usage of beanstalkd was already discussed in this forum: use search to find these discussions.

Goodbye Crawlee.
Add a reply
Sign up and join the conversation on Discord