Apify and Crawlee Official Forum

Updated last week

Incremental Web scraping using Crawlee

Hey everyone. :perfecto: :crawlee:
Currently, I am working on scraping one website where new content (pages) is added frequently (as an example we can say like a blog). So when I run my scraper it scrapes all pages successfully but when I run it for example tomorrow (when new pages are added to websites) it will start scraping everything again.

I would be thankful if you could give me some advice, ideas, solutions, or examples out there of efficiently re-scraping without crawling the entire site again.

Thank you in advance. πŸ™πŸ»
m
a
3 comments
@titavilanova2 dm me
You can save your previously scrapped in some file (could be a simple file or a named key value store if you're using crawlee) then on next executions you'd collect all URLs, filter on the new ones and scrape the delta
Or may be check if there's some sitemap file
Add a reply
Sign up and join the conversation on Discord