Apify and Crawlee Official Forum

Updated 2 years ago

Best practice for rendering javascript, then doing a deep or structuredclone of the window object?

Hello, I am looking for general high level advice for the best approach to crawl a site, and save the *.js resources as well as log the window object. Does anyone have an idea? I'm a little unsure if I should be leaning more on the playwright API or if there is a built-in utility or helper function for downloading resources ( and analyzing the window object at a depth of 3 or 4 ) from the site. Thanks in advance for any help.
A
L
2 comments
SDK do not provide any special support for that, you need to choose either Puppeteer or Playwright framework then see what works better for your case. Usually when you parsing values from browser in actor code you know what is it, but if not, i.e. to find object key you can reuse https://lodash.com/docs/#findKey - its available as SDK dependency
Hello,
  1. You can extract all the <script> tags from the HTML that contain the JS loaded with HTML
  2. You can catch the responses with page.on('response' that contain JS
  3. There is probably some library for serializing window object. Generally, it will just need to replace all the references and non-serializable stuff
Add a reply
Sign up and join the conversation on Discord