Apify Discord Mirror

Updated 5 months ago

download xml.gz sitemaps.

At a glance

A community member is trying to parse sitemaps from a website that uses .xml.gz files, and they are looking for a way to decompress these files in the Crawlee library, which only has the "downloadListOfUrls" method. Other community members have shared that they have parsed these files using tools from Node.js, and one community member has offered to share their solution. However, there is no explicitly marked answer in the comments.

Useful resources
I'm trying to parse the sitemaps from a website that has .xml.gz sitemaps, in python I could use gunzip to decompress and use them.
In crawlee we only have the "downloadListOfUrls" method, how I could make it to decompress those files before using them >?
sitemap: https://www.zoro.com/sitemaps/usa/sitemap-product-10.xml.gz
N
A
6 comments
I parsed them but using tools from Node.
It would be nice to have those built in, in crawlee
Replied in a different thread. Also passed the question/suggestion to the team πŸ‘
I can share my solution if needed
If you don't mind - I could definitely pass it to the team πŸ‘ thankls
Add a reply
Sign up and join the conversation on Discord