Scraping single page with load more button

aacanimal

Hi, I just discovered Crawlee and seems a very great project.

I'm scraping a single url (https://jobs.workable.com/search) that contains a list of items with a load more button. Each time an item is clicked a floating modal show the item information.

In this scenario all the power of crawlee to remember visited urls, retries, etc is not a help.

My idea is:

From the start page, click on each of the initial items and scrape its content
Click on the load more button and repeat the process.

The help I'm requesting is in how to apply best practices for:

how to "remember/store" the last scrapped item index/id
how to handle with errors

Thanks in advance

3 comments

tthek1tten

I'd recommend checking out the infiniteScroll function in Crawlee:
https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#infiniteScroll

Your use case with a Load more button can be solved by using the buttonSelector option which checks and clicks a button if it appears while scrolling.

See more in the docs: https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector

tthek1tten

And your clicking of each item on the page can be done in the stopScrollCallback:

https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector

aacanimal

Thanks!

Add a reply

Apify and Crawlee Official Forum

Scraping single page with load more button