I’m working on a project using PlaywrightCrawler to scrape links from a dynamic JavaScript-rendered website. The challenge is that the <a> tags don’t have href attributes, so I need to click on them and capture the resulting URLs.
- Delayed Link Rendering: Links are dynamically rendered with JavaScript, often taking time due to a loader. How can I ensure all links are loaded before clicking?
- Navigation Issues: Some links don’t navigate as expected or fail when trying to open in a new context.
- Memory Overload: I get the warning "Memory is critically overloaded" during crawls
I've attached images of my code (it was too long so I couldn't paste it)
How can I handle these issues more efficiently, especially for dynamic and JavaScript-heavy sites?
I would appreciate any help