Apify and Crawlee Official Forum

Updated 2 years ago

Scraping single page with load more button

Hi, I just discovered Crawlee and seems a very great project.

I'm scraping a single url (https://jobs.workable.com/search) that contains a list of items with a load more button. Each time an item is clicked a floating modal show the item information.

In this scenario all the power of crawlee to remember visited urls, retries, etc is not a help.

My idea is:
  • From the start page, click on each of the initial items and scrape its content
  • Click on the load more button and repeat the process.
The help I'm requesting is in how to apply best practices for:
  • how to "remember/store" the last scrapped item index/id
  • how to handle with errors
Thanks in advance
t
a
3 comments
I'd recommend checking out the infiniteScroll function in Crawlee:
https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#infiniteScroll

Your use case with a Load more button can be solved by using the buttonSelector option which checks and clicks a button if it appears while scrolling.

See more in the docs: https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector
And your clicking of each item on the page can be done in the stopScrollCallback:

https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#buttonSelector
Add a reply
Sign up and join the conversation on Discord