Pagination works locally in Crawlee but the same actor on Apify the pagination does not work correct
Pagination works locally in Crawlee but the same actor on Apify the pagination does not work correct
At a glance
The community member has implemented pagination to scrape data from multiple pages, but the actor running on Apify.com does not start from the intended page 2 and does not finish at page 5 as expected. The community members suggest various troubleshooting steps, such as verifying the latest commit, checking the actor input, and ensuring the state data does not interfere with the scraping. They also discuss potential issues with parsing the page number input and using global variables. Finally, the community member states that the issue was a bug in the implementation of the query params, and they have now solved the problem.
I have implemented pagination that can start from eg. page 2 and end at including page 5 to scrape all the data from each page. It works correctly on my local machine and I have pushed the newest working code (newest commit id) to GitHub and then to Apify via Webhook, however, when I run the actor on Apify.com it starts at the first page instead of page 2 and does not finish at including page 5. Any suggestions on what might be wrong?
I have verified the commit id in the latest build that I run the scraper with is the latest commit it Github master branch so I would suspect this to be the issue and I can also see the latest change (test logging) was in the latest run as well. But I guess it never hurts to try it out. Input also seems to work, however locally it is a string and on apify it is a number, though this does not explain why the pagination still just cuts off at page 4 ?
Hmm, I can't generate more ideas without seeing at least something) I had an issue once that selector was missing when I run crawler on apify platform but it worked perfectly locally. Can't recall what was causing that.
yeah that could be an issue, however I change the url to navigate to the page I want, but then after I have started on the intended page I will find the next page button link
it also seems even though I push actor using "npx apify-cli push" that the actor does not get updated because I can not see the console.log messages I have made locally
I have solved the issue now. It was a bug in my implementation of query params. It would be really nice if such functionality would be added to crawlee and apify in the future to reduce the risk of bugs π