Apify and Crawlee Official Forum

Updated 3 months ago

Adding request via crawler.addRequest([]) is slow in express.js app.post() method

Dear all, am building a simple API that upon call adds urls via crawler.addRequest() method. On the first call, it's quite fast but on the second and further calls, it's extremely slow. I thought this delay may be coming from me not using the request queue properly. This is what I found in the docs.


Note that RequestList can be used together with RequestQueue by the same crawler. In such cases, each request from RequestList is enqueued into RequestQueue first and then consumed from the latter. This is necessary to avoid the same URL being processed more than once (from the list first and then possibly from the queue). In practical terms, such a combination can be useful when there is a large number of initial URLs, but more URLs would be added dynamically by the crawler.

Can someone please give me a sample code?
P
c
A
5 comments
Hello may you post a little bit more code? Are you running the code locally or on Apify platform? Is addRequest the only thing that your express.js endpoint do? How many requests are you adding at once this way?
Yeah sure I am adding here. I am using Crawlee locally and not using Apify platform.

app.post('/scrape', async (req, res) => { try { var startUrls = []; const AsinData = req.body.AsinList; if(typeof AsinData === 'undefined'){ return res.status(400).json({ error: 'AsinList is undefined'}); } if(AsinData.length === 0){ return res.status(400).json({ error: 'AsinList is empty'}); }else{ const regex = /^[A-Z0-9]{10}$/; const isValid = AsinData.every((item) => regex.test(item)); if (isValid) { console.log("All items in the list meet the criteria."); }else{ return res.status(400).json({ error: 'All ASINS should match the patterns e.g. B0BM4ZPNV1 ' }); } } const queue = await RequestQueue.open("test"); console.log("Asin data",AsinData) await AsinData.forEach((ASIN) => { var url_per_asin = {url: ${BASE_URL}/gp/ajax/asin=${ASIN}, userData:{label:'test',keyword: ASIN}, uniqueKey: uuidv4() } queue.addRequests(url_per_asin); startUrls.push(url_per_asin); }); return res.send("Fetch started..") } catch (error) { // Handle any errors that occur console.error(error); res.status(500).send('Internal server error'); } });
I see that you are calling queue.addRequests in wait AsinData.forEach.

So the first optimization I would do would be: instead of adding it in the queue inside the forEach I would just accumulate the requests into a list and the do a single queue.addRequests call after the forEach . Not sure why is the startUrls there and what is its purpose. You may also use console.time and console.timeEnd (with unique labels for each request) to investigate what is causing long times https://developer.mozilla.org/en-US/docs/Web/API/console/timeEnd
Hey thanks for looking into my code. After doing the suggested change it is faster now. I also opened the queue before and assigned it to the crawler and in the app.post I simply added more into the queue without await.
just advanced to level 2! Thanks for your contributions! πŸŽ‰
Add a reply
Sign up and join the conversation on Discord