Apify and Crawlee Official Forum

Updated 2 weeks ago

Best way to deal with multiple datasets ?

What is the best way to compile all the datasets into a single dataset from multiple agents running and their individual tasks, each tasks has its own set of runs producing multiple datasets.

Found this to be very confusing, zapier etc don't really process data like its needed - needs additional transformation. Thought the platform would have an option as there is an option under storage to create your own dataset but found interesting that there is not way to internally link any existing datasets to it ... possible to explain and advice? Thanks
D
p
m
8 comments
Re: Had raised the same concern over chat support,

i am using the multiple scraper agents, all running different tasks, each tasks has it own set of runs and individual dataset
how can i combine all the data to a single dataset using just apify platform
Attachment
Screenshot_2025-01-07_at_4.50.09_PM.png
I believe that named datasets can be "shared" across actors. So have each of your agents save to a predetermined dataset name (passed in as input arg?). Then the master dataset aggregation actor would read from all those named datasets to generate a new one.
@De you write 'actor' to get all datasets and 'merge them', or you use kvstore and push all data there as well, like I do for deduplication and etc
i have two tasks atm
Attachment
Screenshot_2025-01-07_at_5.14.40_PM.png
one is the scrapper, other one is the another actor i'm using for merge all data from scrape,

but i need this to be done for multiple actors/twitter scrappers, who will each fetch different items and merge into a single dataset

possible to let me know if this is possible only via such actors who can merge datasets or apify platform itself has a method of merging various datasets automatically
requirement is that after every run, all scrapper agents should add the scrapped data to a centralised database automatically without the need to manually merge
Maybe you can dump datasets into a shared S3 bucket? Something like https://console.apify.com/actors/zPS5oJWp7gcpJmxeX/input (have not tried it myself)
thanks will try it out
Add a reply
Sign up and join the conversation on Discord