Apify

Apify and Crawlee Official Forum

b
F
A
J
A

saving data in apify actor and cleaning

ive tried saving the data to a rawdata.json file from the data i scrape from my actors,

however i dont get a json output even thought the scraping works

how would i save the data to the apify console that i can then use mongodb to take that data and put it in my database -

i have my mongodb schema already setup so how would i save the data to the apify console and access it

would i have to save it to the apify dataset, if so how, and how would i also put it through a cleaning process through the same actor or if possible, a different actor and THEN save it to a mongodb database?

heres what i have for saving the json file so far:
h
P
6 comments
Plain Text
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
    try {
        console.log('Scraping products');

        const site = 'Bambaw';

        const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

        const descriptions = await ......

        const productData = {
        url: request.loadedUrl,
        site,
        title,
        descriptions,
        originalPrice,
        salePrice,
        shippingInfo,
        reviewScore,
        reviewNumber,
        };

        productList.push(productData);

        console.log('Scraped ', productList.length, ' products')
        // Read the existing data from the rawData.json file
        let rawData: any = {};
        try {
            const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
            rawData = JSON.parse(rawDataStr);
        } catch (error) {
            console.log('Error reading rawData.json:', error);
        }

        // Append the new data to the existing data
        if (rawData.productList) {
            rawData.productList.push(productData);
        } else {
            rawData.productList = [productData];
        }

        // Write the updated data back to the rawData.json file
        fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
        console.log('rawData.json updated for Bambaw');
    } catch (error) {
        console.log('Error scraping product:', error);
        bambawQueue.reclaimRequest(request);
        return;
    }    
Hmm... this should generally work... The question might be where is the file saved. You might find the examples for working with Dataset here https://crawlee.dev/api/core/class/Dataset (this will generated a new fiel in storages folder for each item in dataset). You should be able to even send it to the MongoDB directly, depends on your use-case.
would i have to install the fs dependency if so how
no fs module is part of nodejs instalation
does this work in an Actor because it only seems to work on my local compouter
So I am not sure where do you run it. This so you should be fully in control of wherever and how you run it. Are running it on Apify Platform? Then you may send me in DM a link with the run so I may check it.
Add a reply
Sign up and join the conversation on Discord
Join