Apify and Crawlee Official Forum

Updated 4 months ago

Requests queues and preserving write usage

Hello, i'm creating a supermarket data scraper. The supermarket i'm scraping has a sitemap where are the urls for every product are listed. Currently i'm loading those in like this:

Plain Text
const { urls } = await Sitemap.load('https://.../entities/products/detail.xml');

And the passing them to my crawler:
Plain Text
await crawler.run(urls);

However this writes all of them again to the default request queue. Writing +23.000 items to the requests queue every run costs me minimally $0.50 every time. Is there any way I can write the the request queue (or another place) once, and then read from there the next runs?
O
1 comment
But the list of URLs from sitemap is dynamic, no ?
That's why You need to update / scrape it if You want up-to-date information from your target site.

In Your case You can use named request queue:

Plain Text
const queueWithName = await RequestQueue.open('some-name'); 


Or You can try to store all URLs in named Key Value store if it makes sense for You.
Add a reply
Sign up and join the conversation on Discord