Apify

Apify and Crawlee Official Forum

b
F
A
J
A
Members
frankman
f
frankman
Offline, last seen 2 weeks ago
Joined September 17, 2024
I'm conducting reverse engineering and have discovered a link that retrieves all the data I need using the POST method. I've copied the request as cURL to analyze the parameters required for making the correct request.

I've modified the parameters to make the request using the POST method. I've successfully tested this using httpx, but now I want to implement it using the Crawlee framework.

How can I change the method used by the HTTP client to retrieve the data, and how can I pass the modified parameters I've prepared?

Additionally, if anyone has experience, I'd appreciate any insights on handling POST requests within this framework.

Thanks
1 comment
M
Hi. I'm extracting prices of products. In the process, I have the main page where I can extract all the information I need except for the fees. If I go through every product individually, I can get the price and fees, but sometimes I lose the fee information because I get blocked on some products. I want to handle this situation. If I extract the fees, I want to add them to my product_item, but if I get blocked, I want to pass this data as empty. I'm using the "Router" class as the Crawlee team explains here: https://crawlee.dev/python/docs/introduction/refactoring. When I add my URL extracted from the first page as shown below, I cannot pass data extracted before:

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES')

I want something like this:

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES', data=product_item # type: dict)

But I cannot do the above. How can I do it?

So, my final data will be showed as:

If I handle the data correctly I want something like this:
product_item = {product_id: 1234, price: 50$, fees: 3$}

If I get blocked, I have something like this:
product_item = {product_id: 1234, price: 50$, fees: ''}
4 comments
f
M