Apify and Crawlee Official Forum

Best practice for rendering javascript, then doing a deep or structuredclone of the window object?

Hello, I am looking for general high level advice for the best approach to crawl a site, and save the *.js resources as well as log the window object. Does anyone have an idea? I'm a little unsure if I should be leaning more on the playwright API or if there is a built-in utility or helper function for downloading resources ( and analyzing the window object at a depth of 3 or 4 ) from the site. Thanks in advance for any help.

2 comments

AAlexey Udovydchenko

SDK do not provide any special support for that, you need to choose either Puppeteer or Playwright framework then see what works better for your case. Usually when you parsing values from browser in actor code you know what is it, but if not, i.e. to find object key you can reuse https://lodash.com/docs/#findKey - its available as SDK dependency

LLukas Krivka

Hello,

You can extract all the <script> tags from the HTML that contain the JS loaded with HTML
You can catch the responses with page.on('response' that contain JS
There is probably some library for serializing window object. Generally, it will just need to replace all the references and non-serializable stuff

Add a reply

Join on Discord