Thanks for replying. The problem seemed to be due to the fact that I was using
Task.async_stream
which seems to be hammering the proxy endpoint all at once.
Here's the simplified version of I was using before
def crawler do
Apps.list_apps()
|> Task.async_stream(&update_app/1)
end
def update_app(app) do
url = app.url
case HTTPoison.get(url, [],
timeout: 10_000,
recv_timeout: 10_000,
follow_redirect: true,
proxy: {"proxy.apify.com", 8000},
proxy_auth: {"groups-RESIDENTIAL", @apify_proxy}
) do
....
# handles a bunch of errors
end
Here's the error I was getting:
[error] proxy error: "HTTP/1.1 590 UPSTREAM400\r\nConnection: close\r\nDate: Thu, 08 Aug 2024 14:08:48 GMT\r\nContent-Length: 0\r\n\r\n"
I'm going to try rebuilding the crawler with
Crawly which seems to be a port of Crawlee to Elixir. It might have something to do with your reply to my other thread