Model Fallback - PraisonAI

Model Fallback keeps your agent answering by automatically retrying on alternate models when the primary model is overloaded or unavailable.

Quick Start

One-line resilience

from praisonaiagents import Agent
from praisonaiagents.config import LLMConfig

agent = Agent(
    instructions="You are a helpful assistant",
    llm=LLMConfig(
        model="gpt-4o",
        fallback_models=["claude-3-5-sonnet", "gpt-4o-mini"],
    ),
)
agent.start("Summarise today's news")

Cross-provider chain

Use LiteLLM-style prefixes when mixing providers:

from praisonaiagents import Agent
from praisonaiagents.config import LLMConfig

agent = Agent(
    llm=LLMConfig(
        model="openai/gpt-4o",
        fallback_models=["anthropic/claude-3-5-sonnet", "openai/gpt-4o-mini"],
    ),
)

How It Works

On transient errors (503, timeout, model overloaded), the agent retries the same turn against the next model in fallback_models. Successful calls stay on the primary model.

Configuration Options

Option	Type	Default	Description
`model`	`str`	— (required)	Primary model name
`fallback_models`	`Optional[List[str]]`	`None`	Ordered fallback chain
`base_url`	`Optional[str]`	`None`	Custom endpoint (Ollama, etc.)
`api_key`	`Optional[str]`	`None`	API key (falls back to env vars)
`auth`	`Optional[Dict[str, str]]`	`None`	Extra auth headers

Pass via Agent(llm=LLMConfig(...)) or Agent(model=LLMConfig(...)). See LLM Config for endpoint and auth details.

Common Patterns

Cost degradation — primary is capable; fallbacks get cheaper: ["gpt-4o", "gpt-4o-mini"]. Cross-provider resilience — mix OpenAI and Anthropic so one provider outage does not block the agent. Custom gateway — combine base_url with fallbacks when your proxy fronts multiple models.

Best Practices

Put a cheap same-provider fallback last

Useful for rate limits, not full provider outages — a cheap model on the same API may still fail if the provider is down.

Order by latency and cost

Fallback runs the same prompt; a much weaker model may return a worse answer, not a missing one.

Limit chain length to 2–3

Longer chains delay user-visible errors without improving success rates much.

Use provider prefixes when mixing

LiteLLM-style names (anthropic/..., openai/...) route credentials correctly across providers.

LLM Configuration

Endpoints, API keys, and auth headers.

Models

Choosing models for agents.

Model Router

Dynamic model selection policies.

Rate Limiter

Throttle requests before they fail.

​Quick Start

​How It Works

​Configuration Options

​Common Patterns

​Best Practices

​Related