acanimal

·

Approach to store scrapped data in database (postgres)

(Apologises for the crosslink: https://github.com/apify/crawlee/discussions/1577)

Hi, I recently discovered Crawlee and I'm trying to figure out how can I store the scraped data in database instead in local directorio storage.

Is there any plugin for that? How must I proceed to implement one? Must I code my own class that implements StorageClient interface? If so how must I injected later to be used.

Thanks!

17 comments

6

D

t

e

A

aacanimal

·

Scraping single page with load more button

Hi, I just discovered Crawlee and seems a very great project.

I'm scraping a single url (https://jobs.workable.com/search) that contains a list of items with a load more button. Each time an item is clicked a floating modal show the item information.

In this scenario all the power of crawlee to remember visited urls, retries, etc is not a help.

My idea is:

From the start page, click on each of the initial items and scrape its content
Click on the load more button and repeat the process.

The help I'm requesting is in how to apply best practices for:

how to "remember/store" the last scrapped item index/id
how to handle with errors

Thanks in advance

3 comments

a

t

Apify and Crawlee Official Forum

Approach to store scrapped data in database (postgres)

Scraping single page with load more button