GatewayClient is a reconnecting WebSocket client that handles version negotiation, exponential backoff with jitter, and event-sequence gap detection — so your integration stays connected without you writing the socket loop.
Quick Start
How It Works
| Stage | What happens |
|---|---|
| Connect | TCP/WS opened; join includes min_version, max_version, optional agent_id, token |
| Negotiate | Server picks min(client_max, MAX_PROTOCOL_VERSION) or rejects with version_unsupported |
| Stream | Each event carries a monotonic sequence; client tracks _expected_sequence |
| Drop | on_state_change("reconnecting") fires; backoff with jitter (initial * multiplier^attempts, clamped to max) |
| Resume | Re-join with session_id + since=cursor; server replays from cursor; presence + health returned |
| Gap | If received sequence ≠ expected, on_gap(expected, received) fires; caller can await client.resync() |
Configuration Options
GatewayClient constructor:
| Option | Type | Default | Description |
|---|---|---|---|
url | str | — | WebSocket URL (e.g. ws://localhost:8765) |
agent_id | str | — | Agent ID to join as |
token | Optional[str] | None | Auth token, appended as ?token= query param |
reconnect | bool | True | Auto-reconnect on disconnect |
backoff | Optional[BackoffConfig] | BackoffConfig() | Backoff configuration |
max_reconnect_attempts | Optional[int] | None (infinite) | Max reconnect attempts before giving up |
BackoffConfig fields:
| Option | Type | Default | Description |
|---|---|---|---|
initial | float | 1.0 | Initial delay in seconds |
max | float | 30.0 | Maximum delay in seconds |
multiplier | float | 2.0 | Backoff multiplier per attempt |
jitter | float | 0.2 | Random jitter factor (0–1) |
ConnectionState):
| State | Meaning |
|---|---|
disconnected | No active connection |
connecting | Attempting initial connection |
connected | Joined and streaming events |
reconnecting | Waiting before next retry |
| Attribute | Signature | When it fires |
|---|---|---|
on_gap | Callable[[int, int], None] | A monotonic sequence gap is detected (expected, received) |
on_state_change | Callable[[str], None] | Connection state changes |
Protocol Version Negotiation
The client and server negotiate a protocol version during thejoin handshake.
- Constants in
protocols.py:PROTOCOL_VERSION = 1,MIN_PROTOCOL_VERSION = 1,MAX_PROTOCOL_VERSION = 1. - Client sends
min_versionandmax_versionin thejoinmessage. - Server replies in
joinedwithprotocol_version,server_min_version,server_max_version.
min_version/max_version fields (non-integer, or min > max) produce code: "invalid_protocol_hello" and raise ConnectionError.
Gap Detection
Every event carries a monotonicsequence field; the client tracks _expected_sequence and fires on_gap when there’s a mismatch.
client.resync() resets the cursor to 0 and reconnects, triggering a full state reload from the server.
Resume Snapshot
When reconnecting with a storedsession_id and since=cursor, the server replies with a single joined payload that restores full client state in one round trip.
The joined payload includes:
| Field | Description |
|---|---|
session_id | Session identifier |
cursor | Current event cursor position |
resumed | True if this is a reconnection |
sequence | Current sequence number (aligned with replay) |
protocol_version | Negotiated protocol version |
server_min_version | Server’s minimum supported version |
server_max_version | Server’s maximum supported version |
presence | List of presence dicts for all connected clients |
health | Gateway health dict |
Common Patterns
- Basic Reconnecting Consumer
- Gap Handler with Resync
- Capped Retries
- Authenticated
Network Blip: What Users See
When a network blip occurs, the client handles reconnection silently — users see no errors and events resume from the cursor automatically. The bot keeps responding because:- Events are buffered until the cursor is confirmed
- The
since=cursoron reconnect replays any events delivered while offline - Users see continuous responses with no error messages
Best Practices
Pick a backoff that matches your network
Pick a backoff that matches your network
The defaults (
initial=1.0, max=30.0, multiplier=2.0) suit residential and cellular networks. For LAN-only deployments with fast recovery, tighten initial to 0.1 and max to 5.0.Treat version_unsupported as a permanent failure
Treat version_unsupported as a permanent failure
connect() raises ValueError on version_unsupported and stops retrying. Wrap it and alert your ops team — this means the server and client are incompatible and need a coordinated upgrade.Wire on_gap to your replay or snapshot path
Wire on_gap to your replay or snapshot path
If your app already has its own snapshot mechanism, prefer it over
resync(). A targeted snapshot is faster than a full cursor reset.Pin max_reconnect_attempts in batch jobs
Pin max_reconnect_attempts in batch jobs
Long-running batch jobs should not retry forever on a dead gateway. Set
max_reconnect_attempts so the job fails fast and can be requeued.Related
Gateway
WebSocket control plane for multi-agent coordination
Gateway Overview
Architecture and deployment patterns
Session Persistence
How sessions survive across processes
Push Notifications
Channel-based pub/sub and delivery guarantees

