Apify and Crawlee Official Forum

Updated 5 months ago

Batch PDF Text extraction

At a glance
Hello,

I'm new to apify and tested your Website Content Crawler which worked great. I downloaded several PDFs in that process which are now stored in a database file on apify.

I can manually extract the text using the PDF Text Extractor for each pdf with the key store link.

However for multiple PDFs that is not efficient.

If I provide a database link or key value link of all the PDFs the pdf extraction states invalid file format.

Is there a way to batch process all these PDFs?

Thank you very much πŸ™‚
S
O
2 comments
We will get back to you soon!
Input can ba an array of URLs. This way, you can process multiple URLs simultaneously:
https://console.apify.com/actors/QbKEOrw6PkLcy4Xms/information/latest/readme#input

However, there's no direct way to retrieve all values at once.

You can try to use Apify API:
https://docs.apify.com/api/v2/#/reference/key-value-stores

or you can access the key-value store via your code :
https://docs.apify.com/sdk/js/reference/class/Actor#openKeyValueStore
https://docs.apify.com/sdk/js/docs/next/guides/result-storage#key-value-store

Simply loop over the keys and then utilize the result as an array of URLs.
Add a reply
Sign up and join the conversation on Discord