Apify Discord Mirror

J
Jeno
Offline, last seen 2 weeks ago
Joined August 30, 2024
BrightData's datacenter proxies can be used with socks5 but only with remote dns resolution, thus the protocol should be given like socks5h://...

Testing it with curl works, but using it in crawlee it doesn't work. Just keeps hanging.

Plain Text
    proxyConfiguration: new ProxyConfiguration({
      newUrlFunction: () => {
        return 'socks5h://brd-customer-...-zone-...:...@brd.superproxy.io:22228';
      },
    })


Any idea how it should work?

Edit: since I use CamouFox, I tried:

Plain Text
        firefoxUserPrefs: {
          'network.proxy.socks_remote_dns': true,  // Enable remote DNS resolution
        },


But it still just hangs.
2 comments
P
J
Checking on this page, Crawlee Playwright is detected as bot due to CDP.

https://www.browserscan.net/bot-detection

This is a known issue, also discussed on:

https://github.com/berstend/puppeteer-extra/issues/899

Wondering if Crawlee can come up with a solution?
15 comments
4
h
a
S
J
n
J
Jeno
·

WebRTC IP leak?

Hi, so for the last couple days I am on a quest to evade detection for a project that proved to be quite challanging. As I researched the issue, I noticed that my real IP leaks through WebRTC with a default Crawlee Playwright CLI project. I see a commit to the fingerprint-suite that I think should prevent that, but based on my tests it doesn't. Does it need special setup or anything?
2 comments
L
n
5 comments
J
L
I am having a hard time understanding sessions and proxies. I have the following crawler setup:

Plain Text
const crawler = new PuppeteerCrawler({
    requestList,
    useSessionPool: true,
    persistCookiesPerSession: true,
    proxyConfiguration,
    requestHandler: router,
    requestHandlerTimeoutSecs: 100,
    headless: false,
    minConcurrency: 20,
    maxConcurrency: 30,
    launchContext: {
        launcher: PuppeteerExtra,
        useIncognitoPages: true
    },
})


Basically I want to run the same task concurrently with different proxies. Unless I set useIncognitoPages: true, only one session is used concurrently with one proxy. Is this how it should work? What is the point of having a session pool if only one is used?
5 comments
A
A
J
I have forks in my script and if certain conditions are met, I would like to stop the script. How should I do that? page.close creates issues, especially if I run concurrently.
I am running a script that needs concurrency. I have 64 GB of RAM available and I want to use it to the max. I am running my script on a server so there is not much else running. The problem is, at around 15GB I always get memory overloaded error.

I have tried:
Plain Text
config.set('memoryMbytes', 50_000)
config.set('availableMemoryRatio', 0.95)

Nothing seems to change this behavior. Anything else I can try?
1 comment
P