Apify and Crawlee Official Forum

A
Abdul
Offline, last seen 3 months ago
Joined October 14, 2024
I'm reaching out to the community for some assistance with an issue I'm encountering while using Playwright and Apify. I've built a scraper that incorporates proxy rotation with sessions, but I'm facing a problem where the proxy isn't changing when the session changes. Additionally, I'd like to ensure the same IP address is used for both the proxy and the captcha solver.

Problem Description:

Playwright script utilizes Apify's proxy configuration with session management.
Proxy rotation is set to occur every 10 downloads.
Session IDs are dynamically generated as my_session{index}.
Despite session switching messages, the proxy server remains the same.

Switching to proxy session: my_session1 with {'server': 'http://10.0.94.249:8011', 'username': 'groups-auto,session-my_session1', '}

Switching to proxy session: my_session2 with {'server': 'http://10.0.94.249:8011', 'username': 'groups-auto,session-my_session2', '}

Switching to proxy session: my_session3 with {'server': 'http://10.0.94.249:8011', 'username': 'groups-auto,session-my_session3', }

Code Snippet;

Create proxy configuration

proxy_configuration = await Actor.create_proxy_configuration(
groups=['auto']
)

...


Proxy and session management logic

total_proxies = 30
download_count = 0
proxy_session_index = 0
results = []

for index, manual in enumerate(manuals[:max_pdfs], start=1):
# ... (download logic)

if download_count % 6 == 0:
proxy_session_index += 1
if proxy_session_index > total_proxies:
proxy_session_index = 1
Actor.log.info("Used all proxies. Sleeping before restarting.")
await asyncio.sleep(10)

session_id = f'my_session{proxy_session_index}'
proxy_url = await proxy_configuration.new_url(session_id)
proxy_settings = get_playwright_proxy_settings(proxy_url)
Actor.log.info(f"Switching to proxy session: {session_id} with server {proxy_settings['server']}")
1 comment
A