Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
smasher
s
smasher
Offline, last seen 2 months ago
Joined August 30, 2024
I developing a playwright scraper to do some basic stuffs, after it finish with the urls, it doesn't stop, like my terminal get stuck until I press CTRL + C.
Any flag I should enable?
1 comment
A
I created a new crawler using the npx crawlee create project command, that creates some folders and files, it creates me a router.js file, which it has an instance of createPlaywrightRouter
Plain Text
export const router = createPlaywrightRouter();

router.addDefaultHandler(async ({ enqueueLinks, log }) => {
    log.info(`enqueueing new URLs`);
    await enqueueLinks({
        globs: ['https://crawlee.dev/**'],
        label: 'detail',
    });
});

router.addHandler('detail', async ({ request, page, log }) => {
    const title = await page.title();
    log.info(`${title}`, { url: request.loadedUrl });

    await Dataset.pushData({
        url: request.loadedUrl,
        title,
    });
});

as I understand, you are creating the default handler which is kinda the "main" listener, so later you are calling/invoking your route "detail", for the enqueLinks function, this could be interesting to split your process in more "routes"/steps, so it can be more clean and decoupled later.
My question is, how to call or invoke this without the enqueList function?
I was expecting something like:
Plain Text
router.addDefaultHandler(async (ctx) => {
    await ctx.invoke('extract-meta-data')
    await ctx.invoke('extract-detail')
    await ctx.invoke('download-files')
});

Where can I see the functions this CTX admit or maybe I understood the router totally different.
Thanks 🙂
4 comments
P
s
v