Apify does not automatically enforce robots.txt rules by default. This is because Apify focuses on providing flexibility for web scraping and automation, and some use cases may require bypassing these rules (within the bounds of legality and ethics). Therefore, even if the robots.txt setting is "TRUE," it might not be enforced automatically unless explicitly handled in your code.
You can manually enforce robots.txt rules by adding logic to your actor. For example, you can use libraries like robots-txt-guard in Node.js to parse and respect robots.txt restrictions before pulling data from a website.
Here's a basic approach:
- Parse the robots.txt file from the target site.
- Check whether your actor is allowed to scrape specific endpoints.
- Proceed based on the result of the check.