Apify Discord Mirror

N
Nth
Offline, last seen last week
Joined January 10, 2025
I'm using the AdaptivePlaywrightCrawler with the same-domain strategy in enqueueLinks. The page I'm trying to crawl has delayed JavaScript redirects to other pages, such as Instagram. Sometimes, the crawler mistakenly thinks it's still on the same domain after a redirect and starts adding Instagram URLs to the main domain, like example.com/account/... and example.com/member/..., which don't actually exist, so, how can I stop following these delayed JavaScript redirects?
5 comments
P
N
A
I have the following code for AdaptivePlaywrightCrawler and I want to log the number of enqueued links after calling enqueueLinks.

router.addDefaultHandler(async ({ request, enqueueLinks, parseWithCheerio, querySelector, log, page }) => { await enqueueLinks({ strategy: 'same-domain', globs: globs, transformRequestFunction: (request) => { return request; }, }); });
1 comment
R
I want to use the AdaptivePlaywrightCrawler, but it seems like it wants to crawl the entire web.
Here is my code.

const crawler = new AdaptivePlaywrightCrawler({ renderingTypeDetectionRatio: 0.1, maxRequestsPerCrawl: 50, async requestHandler({ request, enqueueLinks, parseWithCheerio, querySelector, log, urls }) { console.log(request.url, request.uniqueKey); await enqueueLinks(); } }); crawler.run(['https://crawlee.dev']);
3 comments
N
E