Apify and Crawlee Official Forum

Updated 3 months ago

Enqueue Links from new Window

Hi. I'm attempting to scrape data from a website, using crawlee and playwright, that is very JS heavy. The links I'm interested in are created by a JS function that opens the content in a new window. I've implemented the enqueueLinksByClickingElements function with a very specific selector. Playwright reports successfully clicking the links but I suspect the request is not being intercepted.
Plain Text
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: There are 1 elements to click.
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: Successfully clicked 1 elements out of 1
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
INFO  PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down.

I've also passed in the transformRequestFunction to set useExtendedUniqueKey to true. Is there a way I can:
  1. Take a screenshot after Playwright clicks the element?
  2. Log the intercepted requests?
Thanks!
1
o
S
L
4 comments
Hi, to intercept requests made by website's JS, try to use page.route (https://playwright.dev/docs/api/class-page#page-route).

You can take a screenshot by calling page.screenshot (https://playwright.dev/docs/screenshots#full-page-screenshots) right after click.
@ondro_k , do you know if using page.route will intercept ALL external requests? I'm scrapping a site and some of the pages (not all of them) make an external request to algolia. I need to intercept those requests so that I can get an information from the request header. I'd need to make sure that I just follow to the next step (scrapping the html) once all the requests are finished, since some pages will make the algolia request and some won't. Any idea of how I can handle this?
You can try something like:

Plain Text
await page.route('**/*', (route) => {
  return route.request().url().includes('algolia') ? route.abort() : route.continue();
}); // or do whatever You need with the request


Also this article might be handy:
https://medium.com/@kbalaji.kks/playwright-network-insights-how-to-intercept-modify-delete-and-analyze-network-calls-cde402f103e6
Add a reply
Sign up and join the conversation on Discord