Apify Discord Mirror

Updated 6 days ago

Redirect Control

Im trying to make a simple crawler, how do proper control the redirects? Some bad proxies sometimes redirect to auth page , in this case i want to mark the request as failed if the redirect URL ( target ) contains something like /auth/login. Whats the best to handle this scenarios and abort the request earlier?
A
n
O
5 comments
so each request is a session? say i send 3 urls to crawl would this mark them all as failed once the session is marked as bad? I think i might have explained myself incorrectly. This still lets the page navigate to the auth-login page, my question was if its possible to prevent a redirect on the main document and retire the session in case it is.
sessions defined by the session pool, so on blocking mark request session as "bad" to not continue with other requests if current one is blocked
You can do something like this:
Plain Text
  // Option 1: Use the failedRequestHandler
  failedRequestHandler: async ({ request, session, error }) => {
    if (error.message.includes('/auth/login') || request.url.includes('/auth/login')) {
      console.log(`Request redirected to auth page: ${request.url}`);
      // Mark the proxy as bad if you're using a session pool
      if (session) {
        session.markBad();
      }
      // You can retry with a different proxy if needed
      // request.retryCount = 0;
      // await crawler.addRequest(request);
    }
  },
  
  // Option 2: Handle redirects in the request handler
  requestHandler: async ({ request, response, $, crawler, session }) => {
    // Check if we were redirected to an auth page
    if (request.url.includes('/auth/login') || response.url.includes('/auth/login')) {
      console.log(`Detected auth redirect: ${response.url}`);
      // Mark the session as bad
      if (session) {
        session.markBad();
      }
      // Throw an error to fail this request
      throw new Error('Redirected to auth page');
    }
    
    // Your normal processing code if not redirected
    // ...
  },
  
  // Option 3: Use the preNavigationHooks for Playwright/Puppeteer
  preNavigationHooks: [
    async ({ request, page, session }) => {
      // Set up redirect interception
      await page.route('**', async (route) => {
        const url = route.request().url();
        if (url.includes('/auth/login')) {
          console.log(`Intercepted auth redirect: ${url}`);
          // Abort the navigation
          await route.abort();
          // Mark the session as bad
          if (session) {
            session.markBad();
          }
          throw new Error('Prevented auth page redirect');
        } else {
          await route.continue();
        }
      });
    }
  ],
Add a reply
Sign up and join the conversation on Discord