Hi ,
Regarding the input question:
The input can be any JSON, but on the top level, it has to be an object. I think you could just get around this by passing an object with one field - an array of your objects.
- Batching multiple items should be usually the way to go, as you can process multiple requests in parallel and therefore it's more cost-effective.
The default configuration of crawlers in crawlee should be quite sensible and enough for most use cases. Generally, there is no right way to go, but it is usually a try-and-error process. I would suggest not spending too much on the optimizations unless you are actually getting blocked. In that case, you can for example modify the max error score for each session (how many times it should be blocked, before rotating).
- You can also pass RequestOptions object like this:
crawler.addRequests([
{
url: 'https://apify.com',
userData: { foo: 'bar' },
},
{
url: 'https://crawlee.dev',
userData: { foo: 'bar' },
},
]);
with enqueueLinks, you can use either the userData field or the transformRequestFunction function:
enqueueLinks({
selector: 'a',
userData: { foo: 'bar' },
});
enqueueLinks({
selector: 'a',
transformRequestFunction: (request) => {
request.userData.foo = 'bar';
return request;
},
});