The community members are discussing the best way to combine multiple datasets from different agents and tasks into a single dataset. They have found that the Apify platform does not have a built-in option to link existing datasets, and they are looking for a solution. Some suggestions include:
1. Have each agent save to a predetermined dataset name, then use a "master dataset aggregation actor" to read from all those named datasets and generate a new one.
2. Use Apify's KV (key-value) store to push all the data, which can then be used for deduplication and merging.
3. Dump the datasets into a shared S3 bucket, though one community member has not tried this approach.
The requirement is that after each run, all scraper agents should automatically add the scraped data to a centralized database without the need for manual merging.
What is the best way to compile all the datasets into a single dataset from multiple agents running and their individual tasks, each tasks has its own set of runs producing multiple datasets.
Found this to be very confusing, zapier etc don't really process data like its needed - needs additional transformation. Thought the platform would have an option as there is an option under storage to create your own dataset but found interesting that there is not way to internally link any existing datasets to it ... possible to explain and advice? Thanks
Re: Had raised the same concern over chat support,
i am using the multiple scraper agents, all running different tasks, each tasks has it own set of runs and individual dataset how can i combine all the data to a single dataset using just apify platform
I believe that named datasets can be "shared" across actors. So have each of your agents save to a predetermined dataset name (passed in as input arg?). Then the master dataset aggregation actor would read from all those named datasets to generate a new one.
one is the scrapper, other one is the another actor i'm using for merge all data from scrape,
but i need this to be done for multiple actors/twitter scrappers, who will each fetch different items and merge into a single dataset
possible to let me know if this is possible only via such actors who can merge datasets or apify platform itself has a method of merging various datasets automatically
requirement is that after every run, all scrapper agents should add the scrapped data to a centralised database automatically without the need to manually merge