Apify Discord Mirror

C
Casper
Offline, last seen 5 months ago
Joined August 30, 2024
I am getting the error: Error:
Plain Text
 Cannot find module '/home/myuser/dist/main.js' 

when trying to run a successfull build for my actor on Apify, but locally it works without errors.
I have disabled a lot of linting and warnings as they serve me no real purpose other than stop my typescript compliation from succeeding.
It is a Typescript Crawlee project.

Do you have any idea of what might be the cause?
9 comments
G
C
L
o
O
I have been having issues for quite a while now with fixing node-ts that suddenly don't understand TypeScript files and hence won't compile them to javascript that node can run. I have managed to fix it locally using tsx instead of node-ts but since tsx is not supported on apify.com the run fails immediately.

In addition when I look in the latest build log I can't recognize that the latest commit on my main branch from github is used. I have cleaned up every build and version so that only the latest build is present. The crawlee and apify docs have not given me some direction to investigate this.

This has been a major headache for a while now so I hope I can get some guidance 🙏

Does anyone have similar bugs? 🤔
8 comments
v
C
S
A
"storages": {
"companyInformationDataset": "./dataset_company_information_schema.json",
"reviewsDataset": "./dataset_reviews_schema.json"
}

I have these storage schemas in actor.json file but Apify UI does not pick up to use them to display scraped data that has been saved in them. So Users do not know if scraped data was saved unless they check each named dataset. I don't save data to unnamed dataset.
3 comments
C
S
I would like to add an input to my actor as such:

  • "website1.com"
  • "website2.com"
How can I do this best in the Apify input schema and how can I extract these URL strings in Typescript code?

The below approach unfortunately does not work:
Plain Text
interface InputSchema {
  companyWebsites: string[];
  sortBy: string;
  filterByStarRating: string;
  filterBylanguage: string;
  filterByVerified: string;
  startFromPageNumber: number;
  endAtPageNumber: number;
}
const input = await Actor.getInput<InputSchema>();
let companyWebsites: string[] | undefined = input?.companyWebsites;
companyWebsites?.forEach(function (companyWebsite) {
  console.log(companyWebsite);
});
11 comments
R
C
A
I can get the input from Apify in my Crawlee Playwright code and console.log() the start urls, but I am not sure how to access them because it says the start urls are of type any instead of an array of strings. Can you provide some example code for this to be extracted so I can use them as start urls in my code?
28 comments
C
P
I have implemented pagination that can start from eg. page 2 and end at including page 5 to scrape all the data from each page. It works correctly on my local machine and I have pushed the newest working code (newest commit id) to GitHub and then to Apify via Webhook, however, when I run the actor on Apify.com it starts at the first page instead of page 2 and does not finish at including page 5. Any suggestions on what might be wrong?
17 comments
C
L
A
y
How can I force the INPUT.schema file to update the input schema on Apify.com? These input fileds should exist on the apify.com store page.
5 comments
L
C
I have a paid actor and I have set up a scheduled task to daily check if it works for my customers. Will I automatically get an email if the run fails?
8 comments
H
A
C
P
I would really like to get the ability to see who my customers are that use my paid actor so I can contact them and make sure that the solution fit their needs perfectly.

Even better if there would be a solution to make it more user friendly for my customers as some of them do not have that much technical knowledge, so documentation can be hard to write.

In addition it would make it possible for me to notify my customers of any breaking/non breaking changes that I have made to improve the actor.

Also I would like to be able to see how much my customers are using the product (number of runs, are they using API calls, amount of data retrieved and so on) such that I am able to determine what needs to be improved, as the customer might not always contact me thus I lose that customer because the solution is not solving their issues.

Lastly it would be really nice if I can get an email or get webhook from apify when I have either gained a customer or lost them so I don't have to manually login to the platform to view this.

These challenges make it difficult for me to provide adequate customer care and scale the number of customers that use my paid actors, so this would really help me and the apify platform.
7 comments
A
C
D
P
I would like to get this information to be able to create named datasets based on this run id.
3 comments
H
C
A
I just want to segment the code in small functions. Below code not work because type is not recoqnized.
2 comments
C
L
I would like to get some feedback on what to log to users and customers of my actor.
At the moment I log progression through each page and any data retrieved but I think this is unnecessary.
I just want to inform the user the progression and if it is running properly but I am unsure what would provide the most value.

What do you do for your paid actors?
3 comments
t
A
I am trying to develop a crawlee scraper locally and it that regard I need to easily purge all data from the default and named datasets as well as request queues to test my changes.
However, it does not purge storage.
I intend to use the crawlee code in an Apify Actor.
Do you have any suggestions of what might be the issue?
2 comments
C
H
I have managed to get some parts of it working such that I have a NodeJs API that starts my crawler.
I have yet to manage the request queue to handle additional and concurrent API calls, so I would just like to know if someone has had any luck implementing such a solution?
My particular use case for this API requires running in my cloud instead of Apify
23 comments
A
C
L
S
R
How can I execute "document.execCommand("insertText", false, "25810") from playwright?
7 comments
C
L
A
I want to test different scenarios for how the input for the scraper can be used so I can run this test to make sure users do not experience problems.
7 comments
L
C
O
Sometimes a dialog box might pop up on a site and I am not interested in the dialog and would just like it to be dismissed.
2 comments
C
A
I am trying to scrape a site that generates different CSS classes for the target elements I need to get the value of each time the page is rendered and there are no other attributes to select or suitable parent elements to traverse and I would prefer not using XPATH. Is it possible to decode this HTML to its original form to more easily scrape it?

Also is there any technique that would make it possible to detect changes or addition of pages?
2 comments
C
P
Does someone know of a simple npm library to download files from a URL in Javascript/TypeScript?
8 comments
C
A
g
Is it possible to set a debug breakpoint in VS Code when writing TypeScript code for Crawlee in to inspect the value of variables and flow of the code execution?
6 comments
e
H
C
L
I can see a Python SDK for Apify has been released. Is a Python SDK also planned for Crawlee with the same functionality as Javascript/Typescript with Cheerio and Playwright?
12 comments
C
N
How can I implement pagination starting from eg. page 10 to page 20 or to the last page?
Do I need to implement my own code for this or does Crawlee implement something?
I am able to see the last page on the first page of the website I am scraping.
2 comments
C
H
Is it possible to stop a crawler and resume it from the previous run's request queues?

I have a crawler that has run for a couple hours locally and I would like to add proxies to it to speed up processing speed because I am getting throttled by using 1 IP, but without starting from scratch because it will be unnecessary and a waste of time. I want to use my existing request queues. Is this possible?

Also is this possible on Apify?
4 comments
C
L
t
I have managed to find element for some dropdowns but for a specific dropdown I cant find it.
Is there any HTML attribute I can look at to determine which can be clicked? There does not seem to be a onClick() or similar
12 comments
t
C
Currently I have !Sample reviews, but this does not work locally and in Apify, but does work when reading the readme inside Github.
14 comments
C
!
t