Apify

Apify and Crawlee Official Forum

b
F
A
J
A

Error adding request to the queue: Request ID does not match its unique_key.

Hi and good day. I'm creating a POST API that access the following JSON body:
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

Inside the list of targets, is the extension which my code downloads if it discovers.

I'm already at my wit's end since I don't get the error I'm getting which is:
[crawlee.memory_storage_client._request_queue_client] WARN Error adding request to the queue: Request ID does not match its unique_key.

Does anyone encountered this problem?

The following is my whole code:
Attachments
image.png
image.png
image.png
2
A
N
O
9 comments
@Nyanmaru just advanced to level 1! Thanks for your contributions! πŸŽ‰
Hi, any developers that can help me?
cc @Vlada Dusek πŸ™
I have never set explicitly id of the request, what is the purpose? I think it is colliding with some inner crawlee mechanism that is setting id automatically. You can set just the unique key.
Hi @Nyanmaru

I think you need to use

Plain Text
requests = [Request.from_url(
    url=start_url,
    user_data={"targets": targets},
    unique_key=request_id
)]
I'm trying to make a POST request using FastAPI that accepts JSON body that contains the URL and the target extension in which I want to make flexible by inputting the extension I want to download, example on my JSON body I inserted
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

my program will crawl on the url provided on the JSON body then it will download all files that has the extensions of html and pdf which I can add more like png or jpg
Hi @Mantisus , your solution worked! Only problem is that the targets only stick to the 1st crawl then disappears on the next url
Attachment
image.png
Hi @HonzaS, I'm actually trying to make a POST request using FastAPI that will accept a JSON Body that contains the URL and the target extension that I'll download every page that I'll crawl to. Example:
on my JSON body I inserted
{
"url": "https://crawlee.dev/python/",
"targets": ["html", "pdf"]
}

my program will crawl on the url provided on the JSON body then it will download all files that has the extensions of html and pdf which I can add more like png or jpg
Hi Everyone! Glad to say this finally worked! I've fixed the latest problem encountered by adding the following on my enqueue_links:
await context.enqueue_links(user_data={"targets": targets})

Thank you all for those who answered! πŸ˜„
Add a reply
Sign up and join the conversation on Discord
Join