Apify and Crawlee Official Forum

Updated last week

AdaptivePlaywrightCrawler starts crawling the whole web at some point.

I want to use the AdaptivePlaywrightCrawler, but it seems like it wants to crawl the entire web.
Here is my code.

const crawler = new AdaptivePlaywrightCrawler({ renderingTypeDetectionRatio: 0.1, maxRequestsPerCrawl: 50, async requestHandler({ request, enqueueLinks, parseWithCheerio, querySelector, log, urls }) { console.log(request.url, request.uniqueKey); await enqueueLinks(); } }); crawler.run(['https://crawlee.dev']);
N
E
3 comments
By the way, I've tried with strategy: 'same-domain', 'same-origin' but the result was the same.
Hi you have to restrict crawling to specific domains or patterns
import { AdaptivePlaywrightCrawler } from 'crawlee'; const crawler = new AdaptivePlaywrightCrawler({ renderingTypeDetectionRatio: 0.1, maxRequestsPerCrawl: 50, async requestHandler({ request, enqueueLinks, parseWithCheerio, querySelector, log, urls }) { await enqueueLinks({ pseudoUrls: ['https://crawlee.dev[.*]'], }); }, }); await crawler.run(['https://crawlee.dev']);
Hey, thanks! That kinda works, but I thought the default behavior of enqueueLinks was to stay on the same hostname. It's clearly mentioned in the documentation: https://crawlee.dev/docs/introduction/adding-urls
Add a reply
Sign up and join the conversation on Discord