Long Playwright jobs on Multilogin profiles hit Target closed, Browser closed, or silent WebSocket drops — especially after 30–90 minutes or Mimic OOM. Cookies live in the profile; you usually need reconnect or restart profile, not a fresh login. This recipe wraps CDP attach with recovery.
Failure modes
| Error | Cause | Recovery |
|---|---|---|
Target closed | Tab crashed, OOM | Reconnect CDP; reuse context if browser alive |
Browser closed | Mimic killed, API stop | profile/start again → new CDP |
| WS hang | Network blip, idle timeout | Timeout + restart profile |
| Stale CDP URL | Profile restarted elsewhere | Always fetch fresh URL from API |
Reconnect wrapper (Python + Playwright)
import asyncio
import httpx
from playwright.async_api import async_playwright, Error as PwError
MLX = "https://api.multilogin.com"
TOKEN = "Bearer ..."
HEADERS = {"Authorization": TOKEN}
class CdpSession:
def __init__(self, profile_id: str):
self.profile_id = profile_id
self._client = httpx.AsyncClient(timeout=90, headers=HEADERS)
self._browser = None
self._pw = None
async def start(self):
self._pw = await async_playwright().start()
cdp = await self._launch()
self._browser = await self._pw.chromium.connect_over_cdp(cdp, timeout=60_000)
async def _launch(self) -> str:
r = await self._client.post(f"{MLX}/profile/start",
json={"profile_id": self.profile_id, "headless": False})
r.raise_for_status()
cdp = r.json().get("cdp_url") or r.json().get("wsUrl")
if not cdp:
raise RuntimeError("No CDP URL")
return cdp
async def page(self):
ctx = self._browser.contexts[0] if self._browser.contexts else await self._browser.new_context()
return ctx.pages[0] if ctx.pages else await ctx.new_page()
async def recover(self):
try:
if self._browser:
await self._browser.close()
except PwError:
pass
await self._client.post(f"{MLX}/profile/stop", json={"profile_id": self.profile_id})
await asyncio.sleep(2)
cdp = await self._launch()
self._browser = await self._pw.chromium.connect_over_cdp(cdp, timeout=60_000)
async def close(self):
try:
if self._browser:
await self._browser.close()
finally:
await self._client.post(f"{MLX}/profile/stop", json={"profile_id": self.profile_id})
await self._client.aclose()
if self._pw:
await self._pw.stop()
async def with_retry(session: CdpSession, fn, max_retries=2):
for attempt in range(max_retries + 1):
try:
page = await session.page()
return await fn(page)
except PwError as e:
if attempt == max_retries:
raise
await session.recover()
Usage
async def job(page):
await page.goto("https://seller.example/dashboard", wait_until="domcontentloaded")
# ... long task ...
async def main():
s = CdpSession("profile-uuid")
await s.start()
try:
await with_retry(s, job)
finally:
await s.close()
OpenTelemetry correlation
Emit child spans on every reconnect so Grafana Tempo shows **why** jobs slow down — pairs with the OpenTelemetry queue recipe.
from opentelemetry import trace
tracer = trace.get_tracer("mlx.cdp")
async def recover(self):
with tracer.start_as_current_span("mlx.cdp.recover") as span:
span.set_attribute("mlx.profile_id", self.profile_id)
span.add_event("cdp_target_closed")
await self._restart_profile()
span.set_attribute("mlx.recover_success", True)
Alert when mlx.cdp.recover count > 3 per job — indicates Mimic OOM or unstable proxy, not platform ban.
Design notes
- Checkpoint progress — persist step index to Redis so reconnect resumes mid-workflow
- Do not login() again if profile cookies intact — verify with dashboard URL first
- Lease + queue — combine with queue worker so only one recoverer owns profile
- Log recover count — high rate → RAM, proxy, or Mimic crash — see debug runbook
Related
Disclosure: MLX-MMO affiliated with Multilogin.