Pass args to handler

At a glance

The community member has a crawler that scrapes multiple websites, each with multiple URLs. They want to send the scraped data directly to a database instead of storing it on the EC2 instance. The community member asks if there is a way to pass additional arguments to the @router.default_handler function when running the crawler. A community member suggests using Request.from_url to pass arbitrary arguments, which can then be accessed in the handler via context.request.user_data. The original community member confirms that this is the solution they were looking for.

NNicolay

Hey

I have a crawler which scrapes a lot of different websites, each with multiple urls.

Each website has an associated id, I need for the dataset.

So I want to scrape the urls, get the data but then instantly send it to a database, so I don't have to keep it on the EC2 instance.

Is there a way to pass extra variables to @router.default_handler

for company in valid_company_urls:
crawler = await create_crawler(config, company)

# Run crawler for this company's URLs
await crawler.run(company['url'])

So when I do something like this. How could I pass additional arguments to run that are then passed to the handler.

I have not found anything in the docs.

Thanks for any hints!

3 comments

AAsuha

Not 100% if that is what you mean (or if it is the best solution), but you can pass arbitrary arguments to Request.from_url , which you can then read in the handler, e.g. what I do:
for xxx in xxx:
Request.from_url(
url = url,
label = xxx,
user_data = {
'abc': abc,
'def': def
}
)

And then access it in the handler via context.request.user_data['abc']

NNicolay

Nice! That’s it exactly !

NNicolay

Thanks!

Add a reply

Apify Discord Mirror

Pass args to handler