Apify Platform Forum

I am building a scraper for an Australian website but apify uses US ips so all the requests are getting blocked, does apify support using an Australian ip? Or do I have to supply my own?

HHonza Javorek

Making my Docker image more efficient

From since I added browser to my stack, the builds take 2 and more minutes. I'm trying to make the builds more efficient, but I'm no expert in setting up the image, so I'd appreciate any help. This is what I do right now:

Plain Text

FROM apify/actor-python:3.12
ARG ACTOR_PATH_IN_DOCKER_CONTEXT

RUN rm -rf /usr/src/app/*
WORKDIR /usr/src/app

COPY . ./

RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing Poetry:" \
 && pip install --no-cache-dir poetry~=1.7.1 \
 && echo "Installing dependencies:" \
 && poetry config cache-dir /tmp/.poetry-cache \
 && poetry config virtualenvs.in-project true \
 && poetry install --only=main --no-interaction --no-ansi \
 && rm -rf /tmp/.poetry-cache \
 && echo "All installed Python packages:" \
 && pip freeze \
 && echo "Installing Playwright dependencies:" \
 && poetry run playwright install chromium --with-deps

RUN python3 -m compileall -q ./jg/plucker

ENV ACTOR_PATH_IN_DOCKER_CONTEXT="${ACTOR_PATH_IN_DOCKER_CONTEXT}"
CMD ["poetry", "run", "plucker", "--debug", "crawl", "--apify"]

Is there a way to make it faster?

MMahmudul Hasan Sagar

how to purchase increased actor RAM addon

I'm trying to by some Add RAM but could not find any option. How Can I buy it?

3 comments

ccvb941

Standby python actor throws when accessing the Actor.config object

Plain Text

2024-10-12T21:25:30.211Z ACTOR: Pulling Docker image of build IDpW06CSNrW8Wjb9L from repository.
2024-10-12T21:25:30.433Z ACTOR: Starting Docker container.
2024-10-12T21:25:31.942Z   File "/usr/src/app/src/main.py", line 76, in main
2024-10-12T21:25:31.944Z     site = web.TCPSite(runner, '0.0.0.0', Actor.config.standby_port)
2024-10-12T21:25:31.945Z                                           ^^^^^^^^^^^^
2024-10-12T21:25:31.947Z   File "/usr/local/lib/python3.12/site-packages/apify/_actor.py", line 65, in __init__
2024-10-12T21:25:31.949Z     self._configuration = configuration or Configuration.get_global_configuration()
2024-10-12T21:25:31.950Z                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-12T21:25:31.952Z   File "/usr/local/lib/python3.12/site-packages/crawlee/configuration.py", line 216, in get_global_configuration
2024-10-12T21:25:31.954Z     service_container.set_configuration(cls())
2024-10-12T21:25:31.955Z                                         ^^^^^
2024-10-12T21:25:31.957Z   File "/usr/local/lib/python3.12/site-packages/pydantic_settings/main.py", line 152, in __init__
2024-10-12T21:25:31.958Z     super().__init__(
2024-10-12T21:25:31.960Z   File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 209, in __init__
2024-10-12T21:25:31.962Z     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
2024-10-12T21:25:31.963Z                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-12T21:25:31.965Z pydantic_core._pydantic_core.ValidationError: 1 validation error for Configuration
2024-10-12T21:25:31.967Z actor_timeout_at
2024-10-12T21:25:31.968Z   Input should be a valid datetime or date, input is too short [type=datetime_from_date_parsing, input_value='', input_type=str]
2024-10-12T21:25:31.970Z     For further information visit https://errors.pydantic.dev/2.9/v/datetime_from_date_parsing

2 comments

!!!!Joefree!!! 👑

Minor Bug: README.MD

I use to be able to put image on README complete with width and height

<img src="https://example.com/image.jpg" width="200" height="300">

but now it seem the width and height is ignored , which makes my README uglier than ever before.

2 comments

SSuhaida

any ways to web scraper shopee malaysia using API ?

:perfecto:

3 comments

AAhmed Elkurdi

Is Apify Profitable?

Hello guys I have a question. Is it worth it to be apify actor developer I mean is it profitable ? I developed 2 actors with 100 users but still not much at all .

3 comments

CCORT

Running multiple usernames through the input

Hi guys! I am wondering if I am able to put in multiple username fields into the input variable in this Actor or do I need to loop through to generate outputs for multiple usernames?

Plain Text

import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "username": [
        "zelenskyy_official"
    ],
    "resultsLimit": 30
};

(async () => {
    // Run the Actor and wait for it to finish
    const run = await client.actor("xMc5Ga1oCONPmWJIa").call(input);

    // Fetch and print Actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();

1 comment

ccyyue

ApifyApiError:API end-point can only be accessed using the following HTTP methods: OPTIONS,GET, PUT

when I call the start method in actor.py of the apify_client, the HTTP request method is 'POST', but it throws an error 'This API end-point can only be accessed using the following HTTP methods: OPTIONS, GET, PUT, DELETE'. Has anyone encountered this issue? I would like to know how to resolve it.

1 comment

MMuhammet

Success Rate

As far as I can see, when a user manually stops the actor from running, this decreases the success rate. I think the platform recognizes this as a fail. Do manual stops by the user affect the success rate? Is there any way to prevent this or will there be a solution for this?

1 comment

which browser is the best to crawl

As title said

I’m using chromium currently but it is cpu heavy in usage

Killing browser do not kill the process and because of that it’s easy to get 100% cpu usage pretty quickly

(I’m crawling thousands of websites where on each I’m looking for different data) I already try to load pure html without css, images and other assets, that helped a lot but issue is still there

2 comments

ddidiraja

Crash on using Prisma, generate and schema

My Actor saves data using Prisma client. But when I run the actor, Crawler says is needed to run prisma generate, although I generated after the build.

Any tips on how to solve it? The files are all in /myuser/ folder.

my package.json: https://pastebin.com/KqMYk7Ae
Apify build log: https://pastebin.com/71LrxCWN
Apify run log: https://pastebin.com/fg2dUW0C

2 comments

LLouis Deconinck

Change latest version of actor

I introduced a new version of my actor, but how do I make it the latest version? I assume that the README is also taken from the latest version?

!!!!Joefree!!! 👑

Web UI Bug ... again

I’m not sure if anyone else has experienced this or if it's just me.

When I don’t use an Actor for a long time, about a month or so, I sometimes notice that when I revisit the Actor console, the input fields are pre-filled by someone else, and I don’t know who. This is my own Actor.

These inputs may contain sensitive data like passwords, as shown in the screenshot.

I can't reproduce this issue, but my guess is that when my saved inputs 'expire,' they are automatically replaced by someone else’s inputs.

I hope someone from @Apifyteam can look into this.

Thank You!

7 comments

!!!!Joefree!!! 👑

Minor UI Bug 2024-10-02

The table Headers blocking the dropdown.

IIAmKing

Apify Api Data?

2 comments

MMiso

Setting Default Memory for Actor in actor.json

Can anybody advise how to set DEFAULT memory when creating actor, because when I set "minMemoryMbytes": 128,
"maxMemoryMbytes": 256, Actor always complains that prefilled 1GB is too much. I would like know how to change those default 1GB in Run options to something lower. Tried to set "memoryMbytes": 128, but that doesn't work. Thanks

JJPG

Apify Storage or straight to MongoDB?

I am unclear about the advantages of storing scraping results to Apify's native platform storage options. Why would I do that instead of just posting each result to my MogoDB collection as it comes in?

1 comment

JJPG

Can't create new Actor in my Organization from a GitHub repo

I have created 5 Actor codebases. Each stored in their own private repo under my company's organization on GitHub.

I have successfully instantiated 4 Actors from these in my organization account on the Apify platform. They run and work fine. 💪

However the 5th does not show up when I got to "Create New Actor from GitHub". The others do. My company account on Apify is the $49/month Starter package.

Whats confusing is that when I go to my (free) Personal Apify account, all 5 GitHub repos show up as I would expect. I am able to create all 5 Actors and run them.

I have tried many things, but I can't figure out how to get it to work under my Organization account.

If anyone has any idea on what to try, I'd be super grateful. I have no idea how to get this fixed!

oomfglolbbq

Scraping email addresses from list of websites/urls

How to easily scrape email addresses present in an input list of websites? I cannot find this function which is probably a well sought one so easily! And how to limit the output to only email addresses found, per domain

IIAmKing

Can we make actors incorporating rust?

Can actors handle rust code?

SSeba G

Can I pass parameters to an actor task via API?

Hi, I'm new to Apify. Apologies if this question has been answered before or if I'm posting in the wrong channel.

I'm running a task actor via API and retrieving the results once I receive a webhook indicating the run has completed. I want to pass additional parameters that are not part of the input and then retrieve them along with the results.

Is there a way to achieve this? If so, how?

aazzouz

how is success % calculated/updated?

H team 👋 I'd like to know how is the success % which is displayed on the actor's page calculated and how often it gets updated

2 comments

LLouis Deconinck

published actors & proxies

I've developed an actor which I would like to publish as it is finished. However, in order to scrape sufficient data, proxies would be necessary. How does this work and who pays for the proxies when the actor is published? I'm currently doing local development on a free account.

1 comment

Apify and Crawlee Official Forum

reset password input box broken

Non Australian ip blocked

Making my Docker image more efficient

how to purchase increased actor RAM addon

Standby python actor throws when accessing the Actor.config object

Minor Bug: README.MD

any ways to web scraper shopee malaysia using API ?

Is Apify Profitable?

Running multiple usernames through the input

ApifyApiError:API end-point can only be accessed using the following HTTP methods: OPTIONS,GET, PUT

Success Rate

which browser is the best to crawl

Crash on using Prisma, generate and schema

Change latest version of actor

Web UI Bug ... again

Minor UI Bug 2024-10-02

Apify Api Data?

Setting Default Memory for Actor in actor.json

Apify Storage or straight to MongoDB?

Can't create new Actor in my Organization from a GitHub repo

Scraping email addresses from list of websites/urls

Can we make actors incorporating rust?

Can I pass parameters to an actor task via API?

how is success % calculated/updated?

published actors & proxies