Apify and Crawlee Official Forum

B
Blitz
Offline, last seen 4 months ago
Joined August 30, 2024
I use webhook integrations for all the actors I use. The events I use are run; succeeded, created, failed, timed out, aborted and resurrected.

If I hit my monthly apify usage limit, while an actor is running, Will the webhook fire? What event type (if any) occurs upon reaching the usage limit?
1 comment
B
What is the status/error code (and message if it exists), that occurs when the monthly (money) usage limit- is hit?

(when using the api)
2 comments
L
B
Sorry if the title isn't incredible. I had no idea how to describe what I'm about to say.

I'm using the Google Maps Scraper Orchestrator actor (https://console.apify.com/actors/Uk8ZlE4NVYccUvpHw)

Which obviously means I'm also using the Google Maps Scraper (https://console.apify.com/actors/nwua9Gu5YrADL7ZDj)

I'm storing certain results from the dataset within firestore, however- my programmatic setup requires access to the location(s) variable used when running the actor- relating to each specific result, not the run as a whole.
Generating the location from the results' city / state / country is not viable for me, or any other solution of this sort-

Because the Orchestrator actor merges the results retrieved from the corresponding Google Maps Scraper into a single dataset, I can't easily figure out which results came from which location input

afaik, I can't add / edit data stored in the dataset directly from these actor's results.

The only other solution I can think of is implementing a way to find the related Google Maps Scraper's datasets, and viewing their inputs (instead of the orchestrator's). I'm sure there probably is a way to do this, but I'm sure it would get kind of messy.

Am I missing a low hanging fruit regarding a solution to this problem?
If not, I guess this post is also a feature request for the Orchestrator actor

Help and input appreciated. Thanks :)
5 comments
B
L
f
I've set up webhook integration for an actor to fire upon success.

It works fine, however, I'm concerned about api security. Is there any way I can send more headers via the webhook (or get more reliable, static information)? I'd like to have a condition set up on my server to only accept the webhook if for example; a token matches, or the origin/refer is an apify domain. The couple of keys and Id sent in the headers as they are currently, don't seem to be static or retrievable.

I know I can create variables in the actual payload itself, but I'm wanting to stop the webhook at the header level, before I parse the payload
1 comment
P
Fairly commonly, probably 30% of my profile queries on each actor run, are returned with an 'unknown format'. It happens seemingly random and doesn't always affect the same profile query- retrying the specific profile query can return the expected result.
Here are a couple of screenshots, from the log and storage:
8 comments
R
B
A
L
A
I need to display profile pictures (from scraped social media profiles), to my users on a website. After scraping profiles, I get a returned URL relating to a profile picture.

I'm assuming I can't just set my website up to fetch the image from the URL to display because I need to do this potentially millions of times which would result in IP banning and black listing, no?

So how do I deal with this? Is there a way I can download images using Apify (ideally with settings to change resolution / size, etc) that I can then store in my own database?

Thanks :)
1 comment
O
I just tested out the apify/instagram-scraper. Specifically, the profile data retrieval part of it. But unfortunately it doesn't have some data I need.
I can get the data I need by going to: https://www.instagram.com/{username}/?__a=1&__d=dis, but this supposedly requires me to be logged in, so I turned to apify to see if I could scrape the same data I need. Attached is an image that has the data I'm looking for, highlighted-

Is it possible to scrape this data through apify, ideally through a pre-built actor in the store already?
7 comments
A
B