Apify and Crawlee Official Forum

Updated 4 months ago

Loading files along with HTML-scraped content via LangChain's ApifyDatasetLoader

The ApifyDatasetLoader for LangChain loads the records, which include the text, metadata, and fileUrl fields. All of the examples show loading content via the text or metadata fields — but what about fileUrl? Assuming the run has records for PDF, XLSX, and/or other files, is there an example of how to load those files alongside the scraped HTML content?

2 comments

AAlexey Udovydchenko

Its outside of SDK functionality: https://llamahub.ai/l/apify-dataset check their git or post quiestion there I guess

aadvisor4223

Got it, thanks, will check via the integration repo.

Add a reply