Hi
I'm making a simple app that gets updated information from a website.
This is inside a fastapi app and It uses
AsyncIOScheduler
to run the script every day,
The issue is since the crawl is already visited the main page, for the next call, It will not re visit the page.
I've did a lot of research but couldn't find a solution, other scrapers has someting like force= parameter to force the scrape.
How can we fource the
UNPROCESSED
to the request?
Here is the code
class Scraper:
async def run_scraper(self):
proxy_urls = process_proxy_file('proxy_list.txt')
proxy_configuration = ProxyConfiguration(proxy_urls=proxy_urls)
crawler = PlaywrightCrawler(proxy_configuration=proxy_configuration, headless=False, browser_type='chromium')
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
print('Handling request...')
context.request.state(RequestState.UNPROCESSED)
# Scrape logic here
# Return scraped data if needed
request = Request.from_url('https://crawlee.dev')
await crawler.run([request])
return "Example Scraped Data"