Quick Start
How It Works
On transient errors (503, timeout, model overloaded), the agent retries the same turn against the next model infallback_models. Successful calls stay on the primary model.
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
model | str | — (required) | Primary model name |
fallback_models | Optional[List[str]] | None | Ordered fallback chain |
base_url | Optional[str] | None | Custom endpoint (Ollama, etc.) |
api_key | Optional[str] | None | API key (falls back to env vars) |
auth | Optional[Dict[str, str]] | None | Extra auth headers |
Agent(llm=LLMConfig(...)) or Agent(model=LLMConfig(...)). See LLM Config for endpoint and auth details.
Common Patterns
Cost degradation — primary is capable; fallbacks get cheaper:["gpt-4o", "gpt-4o-mini"].
Cross-provider resilience — mix OpenAI and Anthropic so one provider outage does not block the agent.
Custom gateway — combine base_url with fallbacks when your proxy fronts multiple models.
Best Practices
Put a cheap same-provider fallback last
Put a cheap same-provider fallback last
Useful for rate limits, not full provider outages — a cheap model on the same API may still fail if the provider is down.
Order by latency and cost
Order by latency and cost
Fallback runs the same prompt; a much weaker model may return a worse answer, not a missing one.
Limit chain length to 2–3
Limit chain length to 2–3
Longer chains delay user-visible errors without improving success rates much.
Use provider prefixes when mixing
Use provider prefixes when mixing
LiteLLM-style names (
anthropic/..., openai/...) route credentials correctly across providers.Related
LLM Configuration
Endpoints, API keys, and auth headers.
Models
Choosing models for agents.
Model Router
Dynamic model selection policies.
Rate Limiter
Throttle requests before they fail.

