Apify and Crawlee Official Forum

Updated 2 months ago

Crawlee stops after about 30 items pushed to the datastore, repeats the same data on next run.

I'm writing my first Actor using Crawlee and Playwright crawler to scrape website https://sreality.cz.

I wrote a crawler using as much as possible from the examples in the documentation. It works like this:

  1. Start on the first page of search, for example this one.
  2. Skip ad dialog, if it shows.
  3. Find all links to next pages and add them to the queue with enqueueLinks().
  4. Find all links to individual items (apartments, houses, whatever) and add them to the queue with enqueueLinks().
  5. If next page to process is an item page, scrape the data and save with pushData(). Otherwise, if it's another page, repeat from 3.
In theory, this is all I need to scrape the entire search result list. However what I experience is that it will enqueue all the links (around 185) but only process around 30 of them before finishing. Very strange.

I tried to set maxRequestsPerCrawl: 1000, didn't help.

Maybe I'm missing something but I don't see why it would just stop after around 30 pages. Is there another config somewhere that controls this?

Even more strange, it then logs the final statistic where it says something like "requestsFinished":119. A number that doesn't make sense at all. Less than the number of actually enqueued links but a lot more than the number of actuall processed pages.
H
g
k
4 comments
You need to put there some logging. For example at the start of the request handler.
Hi, would you mind sharing the code so that we can have a look at it?
It's a bit of a mess right now but I can share it. Do you mind, if I share it privately?
Yes sure. You can dm me.
Add a reply
Sign up and join the conversation on Discord