I was told to post this here instead of #chat by DanielDo:
I'm looking for any helpful links/articles/source code for writing actors that split a collection of objects from a dataset into paged collections for batching? I want to support actor input for capping the total dataset records that are allowed to be processed, the size of each page/batch, etc.
The objects retrieved will have a url in one of their keys that the actor will then go fetch and save to the local fs, so I'd like to make sure the actor can stop and resume where it left off without redundant fetches or fs operations.
The end goal is to go from having a dataset with records in the shape of { image: 'https://..../x.png', identifier: 'My Image' } to a zipped archive of all of the images–and the images will be nested under parent directories that are named based on the identifier key of a given record.