Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
Marc Plouhinec
M
Marc Plouhinec
Offline, last seen last month
Joined August 30, 2024
Hello,

I can't push my actor anymore, as apify push returns the following error:
Plain Text
> apify push
Info: Deploying actor 'shopee-api-scraper' to Apify.
WARN  ApifyClient: API request failed 4 times. Max attempts: 9.
Cause:ApifyApiError: Unexpected error: "<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n"
  clientMethod: ActorVersionCollectionClient.create
  statusCode: 502
  type: undefined
  attempt: 4
  httpMethod: post
  path: /v2/acts/1fOnlbdlfcw62gGyD/versions
  stack: 
    at makeRequest (/opt/homebrew/lib/node_modules/apify-cli/node_modules/apify-client/dist/http_client.js:184:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async ActorVersionCollectionClient._create (/opt/homebrew/lib/node_modules/apify-cli/node_modules/apify-client/dist/base/resource_collection_client.js:23:26)
    at async PushCommand.run (/opt/homebrew/lib/node_modules/apify-cli/src/commands/push.js:139:13)
    at async PushCommand._run (/opt/homebrew/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
    at async Config.runCommand (/opt/homebrew/lib/node_modules/apify-cli/node_modules/@oclif/config/lib/config.js:173:24)
    at async Main.run (/opt/homebrew/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/main.js:28:9)
    at async Main._run (/opt/homebrew/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
    at async /opt/homebrew/lib/node_modules/apify-cli/src/bin/run:7:9
node:events:497
      throw er; // Unhandled 'error' event
      ^

...

Node.js v21.7.1


I'm using the latest version of the apify CLI.

How to solve this issue?
4 comments
O
M
Hi all!

My target is to scrap a website composed of SPAs (Single Page Application) and it looks like existing browser crawlers (i.e. PlaywrightCrawler and PuppeteerCrawler) are not a good fit as each request is processed in a new page, which is a waste of resources.

What I need is to open one browser page and execute multiple XHR / fetch requests to their unofficial API, until I get blocked and need to re-open a new browser page to continue until all requests have been processed.
Note that need a browser to pass fingerprint checks and use the website's internal library to digitally sign each request to their unofficial API.

I'm thinking to solve my need by writing a SinglePageBrowserCrawler that extends BasicCrawler and works similarly to BrowserCrawler but manage browser pages differently.

Is it a good idea? Is there a way to do this in a better way?

Thanks in advance for your feedback!
13 comments
M
G
L
m
M
Hi all!

My target is to scrap a website composed of SPAs (Single Page Application) and it looks like existing browser crawlers (i.e. PlaywrightCrawler and PuppeteerCrawler) are not a good fit as each request is processed in a new page, which is a waste of resources.

What I need is to open one browser page and execute multiple XHR / fetch requests to their unofficial API, until I get blocked and need to re-open a new browser page to continue until all requests have been processed.
Note that need a browser to pass fingerprint checks and use the website's internal library to digitally sign each request to their unofficial API.

I'm thinking to solve my need by writing a SinglePageBrowserCrawler that extends BasicCrawler and works similarly to BrowserCrawler but manage browser pages differently.

Is it a good idea? Is there a way to do this in a better way?

Thanks in advance for your feedback!
13 comments
M
G
L
m
M