Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
makditeam
m
makditeam
Offline, last seen last month
Joined August 30, 2024
I'm using puppeteer crawler with keepAlive as true and crawler.run() (without await).

This runs the crawler infinitely and if I insert new requests to the requests queue, they get processed.

(I'm using non persisted request queue)

What I want is to gracefully close the crawler, as in If I get a signal to close, I want to process all the pending requests in the requests queue first and then kill the crawler.

Right now If I do crawler.teardown(), it abruptly closes the crawler instances without processing the pending requests.
2 comments
L
P
I'm using puppeteer crawler with keepAlive as true and crawler.run() (without await).

This runs the crawler infinitely and if I insert new requests to the requests queue, they get processed.

(I'm using non persisted request queue)

What I want is to gracefully close the crawler, as in If I get a signal to close, I want to process all the pending requests in the requests queue first and then kill the crawler.

Right now If I do crawler.teardown(), it abruptly closes the crawler instances without processing the pending requests.
2 comments
L
P
Hi, I'm using Crawlee's Puppeteer crawler

I've imported crawlee via package.json like

Plain Text
{
   "crawlee": "^3.5.4"
}


Shall I be importing puppeteer in via my package.json as well?

I see puppeteer in crawlee's peer dependency but as optional https://www.npmjs.com/package/crawlee?activeTab=code

Issue is that puppeteer gets bumped very often with bug fixes and I've been stuck with puppeteer 21.1.x.

Ideal expectation is that I just import Crawlee and bumping Crawlee bumps puppeteer as well as per its requirements/support.


Reference https://github.com/apify/crawlee/discussions/2101
2 comments
A
A
Hi, I'm using Crawlee's Puppeteer crawler

I've imported crawlee via package.json like

Plain Text
{
   "crawlee": "^3.5.4"
}


Shall I be importing puppeteer in via my package.json as well?

I see puppeteer in crawlee's peer dependency but as optional https://www.npmjs.com/package/crawlee?activeTab=code

Issue is that puppeteer gets bumped very often with bug fixes and I've been stuck with puppeteer 21.1.x.

Ideal expectation is that I just import Crawlee and bumping Crawlee bumps puppeteer as well as per its requirements/support.


Reference https://github.com/apify/crawlee/discussions/2101
2 comments
A
A
Hello,

I have a use case where I need to handle request expiration in the RequestQueue after a specified time (e.g., 30 minutes). Is this achievable in the current scenario?

One possible approach is to set an epoch time in the userData when enqueuing a request. Then, when it reaches the preNavigationHooks phase, you can check the elapsed time against the specified limit and throw a NonRetryableError to prevent further processing of the request.

However, this approach may not be the most elegant solution, and it has the side effect of creating a page object, which in turn opens a browser and creates an empty tab, consuming unnecessary resources.

Is there a more efficient and cleaner way to handle request expiration and avoid the overhead of using resources?
1 comment
R
Hello,

I have a use case where I need to handle request expiration in the RequestQueue after a specified time (e.g., 30 minutes). Is this achievable in the current scenario?

One possible approach is to set an epoch time in the userData when enqueuing a request. Then, when it reaches the preNavigationHooks phase, you can check the elapsed time against the specified limit and throw a NonRetryableError to prevent further processing of the request.

However, this approach may not be the most elegant solution, and it has the side effect of creating a page object, which in turn opens a browser and creates an empty tab, consuming unnecessary resources.

Is there a more efficient and cleaner way to handle request expiration and avoid the overhead of using resources?
1 comment
R