Apify

Apify and Crawlee Official Forum

b
F
A
J
A

enqueueLinks not respecting strategy

Hello folks, I'm running into the issue described here https://github.com/apify/crawlee/issues/2525 using a basic CheerioCrawler. I specify same-domain and it's running all over the internet.

Does anyone have a workaround that I can use to prevent it from going to outside domains?
2
L
t
P
9 comments
So your problems are specifically redirects? Then you will need to throw away the page after it is loaded in requestHandler because the enqueueLinks doesn't know what might the URL redirect to once loaded.
This does not always occur on redirects. While the crawler is scraping a page if it finds an external link it does not apply the same domain logic.

It’s almost like enqueueLinks just switches the strategy to All from SameDomain.
I can provide some sample logs and code in the next few hours once I get to my desk.
I was mistaken. I'm sorry. It was me expanding short links which caused issues.
Hi , regarding this issue is everything working for you as expected now?
yes it is. It was a mistake on my part of how I was handling redirects earlier in the code.
just advanced to level 1! Thanks for your contributions! 🎉
Short links make everything worse. ;D Congrats on figuring it out!
Yes they do! Thanks !
Add a reply
Sign up and join the conversation on Discord
Join