Error adding request to the queue: Request ID does not ...

NNyanmaru

Hi and good day. I'm creating a POST API that access the following JSON body:
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

Inside the list of targets, is the extension which my code downloads if it discovers.

I'm already at my wit's end since I don't get the error I'm getting which is:

[crawlee.memory_storage_client._request_queue_client] WARN  Error adding request to the queue: Request ID does not match its unique_key.

Does anyone encountered this problem?

The following is my whole code:

Attachments

9 comments

AApifyBot

@Nyanmaru just advanced to level 1! Thanks for your contributions! 🎉

NNyanmaru

Hi, any developers that can help me?

OOleg V.

cc @Vlada Dusek 🙏

HHonzaS

I have never set explicitly id of the request, what is the purpose? I think it is colliding with some inner crawlee mechanism that is setting id automatically. You can set just the unique key.

MMantisus

Hi @Nyanmaru

I think you need to use

Plain Text

requests = [Request.from_url(
    url=start_url,
    user_data={"targets": targets},
    unique_key=request_id
)]

NNyanmaru

I'm trying to make a POST request using FastAPI that accepts JSON body that contains the URL and the target extension in which I want to make flexible by inputting the extension I want to download, example on my JSON body I inserted
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

my program will crawl on the url provided on the JSON body then it will download all files that has the extensions of html and pdf which I can add more like png or jpg

NNyanmaru

Hi @Mantisus , your solution worked! Only problem is that the targets only stick to the 1st crawl then disappears on the next url

Attachment

NNyanmaru

Hi @HonzaS, I'm actually trying to make a POST request using FastAPI that will accept a JSON Body that contains the URL and the target extension that I'll download every page that I'll crawl to. Example:
on my JSON body I inserted
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

my program will crawl on the url provided on the JSON body then it will download all files that has the extensions of html and pdf which I can add more like png or jpg

NNyanmaru

Hi Everyone! Glad to say this finally worked! I've fixed the latest problem encountered by adding the following on my enqueue_links:
await context.enqueue_links(user_data={"targets": targets})

Thank you all for those who answered! 😄

Add a reply

Join on Discord

Apify and Crawlee Official Forum

Error adding request to the queue: Request ID does not match its unique_key.