Apify Discord Mirror

Updated 2 months ago

AdaptivePlaywrightCrawler starts crawling the whole web at some point.

At a glance

A community member is using the AdaptivePlaywrightCrawler but it seems to be crawling the entire web, which is not the desired behavior. The community member has provided their code and tried using different strategies like same-domain and same-origin, but the result was the same.

Another community member suggests restricting the crawling to specific domains or patterns by using the pseudoUrls option when calling enqueueLinks(). This seems to work, but the original community member notes that they thought the default behavior of enqueueLinks() was to stay on the same hostname, as mentioned in the documentation.

There is no explicitly marked answer in the comments.

Useful resources
I want to use the AdaptivePlaywrightCrawler, but it seems like it wants to crawl the entire web.
Here is my code.

const crawler = new AdaptivePlaywrightCrawler({ renderingTypeDetectionRatio: 0.1, maxRequestsPerCrawl: 50, async requestHandler({ request, enqueueLinks, parseWithCheerio, querySelector, log, urls }) { console.log(request.url, request.uniqueKey); await enqueueLinks(); } }); crawler.run(['https://crawlee.dev']);
N
E
3 comments
By the way, I've tried with strategy: 'same-domain', 'same-origin' but the result was the same.
Hi you have to restrict crawling to specific domains or patterns
import { AdaptivePlaywrightCrawler } from 'crawlee'; const crawler = new AdaptivePlaywrightCrawler({ renderingTypeDetectionRatio: 0.1, maxRequestsPerCrawl: 50, async requestHandler({ request, enqueueLinks, parseWithCheerio, querySelector, log, urls }) { await enqueueLinks({ pseudoUrls: ['https://crawlee.dev[.*]'], }); }, }); await crawler.run(['https://crawlee.dev']);
Hey, thanks! That kinda works, but I thought the default behavior of enqueueLinks was to stay on the same hostname. It's clearly mentioned in the documentation: https://crawlee.dev/docs/introduction/adding-urls
Add a reply
Sign up and join the conversation on Discord