Apify and Crawlee Official Forum

a
aLmktr
Offline, last seen last week
Joined October 4, 2024
my crawler with PlaywrightCrawler works just fine but I have issue when adding proxy !!!
this is the code

Plain Text
import { PlaywrightCrawler, ProxyConfiguration } from "crawlee";

const startUrls = ['http://quotes.toscrape.com/js/'];

const crawler = new PlaywrightCrawler({
    requestHandler: async ({ page, parseWithCheerio }) => {
        await page.waitForSelector("div.quote span.text", { "timeout": 60000 });
        const $ = await parseWithCheerio()

        const quotes = $("div.quote span.text")
        quotes.each((_, element) => { console.log($(element).text()) });
    },
});

await crawler.run(startUrls);


however when I add my proxy port I always get timeout erros !!!

Plain Text
const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: ["url-to-proxy-port-im-using"]
})

// and the add it to crawler
const crawler = new PlaywrightCrawler({
  proxyConfiguration,
  ...


and also the same code with the proxy configuration works with CheerioCrawler !!!!
can anyone help with this issue !?
5 comments
a
P
I'm working on news web crawler, and setting purgeOnStart=false so that I don't scrape duplicated news, however sometimes in some cases I got the message "All requests from the queue have been processed, the crawler will shut down." and the crawler don't run, any suggestion to fix this issue??
6 comments
a
A
H