Quick Start
Enable with True (simplest)
retry=True enables retry with sensible defaults: 3 retries, 5 s → 10 s → 20 s exponential schedule, capped at 120 s, with 50% additive jitter.How It Works
| Aspect | Detail |
|---|---|
| What gets retried | Only LLMError where is_retryable=True (rate limits, overloads) |
| What does NOT get retried | Auth errors, invalid requests, non-retryable LLMError, any other exception |
| Total attempts | max_retries + 1 (default: 4 total) |
| Backoff schedule | min(base_delay × 2^attempt, max_delay) + uniform(0, jitter_ratio × delay) |
| Interruption | Raises RuntimeError("Agent interrupted during retry backoff") immediately |
Configuration Options
RetryBackoffConfig Fields
| Option | Type | Default | Description |
|---|---|---|---|
base_delay | float | 5.0 | Base delay in seconds for the first retry. |
max_delay | float | 120.0 | Upper cap on any single backoff (after jitter). |
jitter_ratio | float | 0.5 | Adds uniform(0, jitter_ratio × delay) on top of exponential delay. Set 0.0 to disable jitter. |
max_retries | int | 3 | Maximum number of retries (so up to 4 total attempts by default). |
ValueError if:
base_delay <= 0max_delay < base_delayjitter_ratiooutside[0, 1]max_retries < 0
Precedence
Common Patterns
Rate-limit friendly long jobs
Strict mode — fail fast
Reproducible tests — disable jitter
Observe retries with a hook
OnRetry hook receives:
| Field | Type | Description |
|---|---|---|
attempt | int | Current attempt number (0-based) |
max_retries | int | Configured max retries |
delay_seconds | float | Seconds the agent will sleep before the next attempt |
error_message | str | String representation of the failing LLMError |
operation | str | "llm_request" (sync) or "async_llm_request" (async) |
Best Practices
Start with retry=True, tune later
Start with retry=True, tune later
The defaults (
base_delay=5.0, max_delay=120.0, jitter_ratio=0.5, max_retries=3) are well-suited to most OpenAI and Anthropic rate-limit patterns. Start with retry=True and only tune when you observe systematic timeouts or excessive waiting.Don't disable jitter in production
Don't disable jitter in production
Setting
jitter_ratio=0.0 creates a deterministic schedule that is useful for tests but dangerous in production. When many agents share the same API key and all retry at the same second, they hammer the endpoint simultaneously — exactly what jitter prevents. Keep jitter_ratio at 0.3 or higher in production.Cap max_delay for user-facing flows
Cap max_delay for user-facing flows
A 120-second wait is acceptable for background batch jobs but not when a human is waiting for a response. For interactive agents, set
max_delay to something like 20.0 or 30.0, and keep max_retries low (1–2).Use the OnRetry hook for observability, not control flow
Use the OnRetry hook for observability, not control flow
The
OnRetry hook is the right place to log metrics and send alerts. Retries are best-effort — if all attempts fail, the original LLMError propagates to your caller. Build your resilience strategy around catching that exception in your application code, not inside the hook.Related
Tool Retry Policy
Retry tool calls — a different surface from LLM call retry.
Structured LLM Errors
Which
LLMError categories are classified as retryable.Hook Events
The
OnRetry event and all other lifecycle hooks.Agent Retry Strategies
Strategy guidance for production retry patterns.

