Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

Model Failover automatically switches between LLM providers when one fails, ensuring your agents remain operational even during API outages or rate limits.

Quick Start

1

Configure Auth Profiles

from praisonaiagents import AuthProfile, FailoverManager

# Create profiles for different providers
openai = AuthProfile(
    name="openai",
    provider="openai",
    api_key="sk-...",
    priority=1
)

anthropic = AuthProfile(
    name="anthropic", 
    provider="anthropic",
    api_key="sk-ant-...",
    priority=2
)
2

Setup Failover Manager

from praisonaiagents import FailoverConfig, FailoverManager

config = FailoverConfig(
    max_retries=3,
    retry_delay=1.0,
    exponential_backoff=True
)

manager = FailoverManager(config)
manager.add_profile(openai)
manager.add_profile(anthropic)
3

Use with Agent


How failover activates during retries

Failover now drives LLM retries through direct integration with the retry mechanism:
  • On every LLM call, the system first gets the current profile via get_next_profile() and applies its api_key, base_url, and model settings
  • On success, mark_success(profile) is called to track the working provider
  • On failure, mark_failure(profile, error, is_rate_limit=...) marks the provider as failed, then get_next_profile() fetches the next available provider
  • Profile switching overrides non-retryable classification—one extra attempt is always granted after switching providers
  • The LLM automatically updates request parameters (api_key, base_url, model) when switching between profiles

How It Works

ComponentRole
AuthProfileCredentials for a single provider
FailoverManagerOrchestrates failover logic
FailoverConfigRetry and backoff settings
ProviderStatusTracks provider health

Configuration Options

from praisonaiagents import FailoverConfig

config = FailoverConfig(
    max_retries=3,              # Max retry attempts
    retry_delay=1.0,            # Initial delay (seconds)
    exponential_backoff=True,   # Enable exponential backoff
    max_retry_delay=60.0,       # Max delay between retries
    failover_on_rate_limit=True,# Failover on 429 errors
    failover_on_timeout=True,   # Failover on timeouts
    failover_on_error=True,     # Failover on other errors
)
OptionTypeDefaultDescription
max_retriesint3Maximum retry attempts
retry_delayfloat1.0Initial retry delay
exponential_backoffboolTrueUse exponential backoff
max_retry_delayfloat60.0Maximum retry delay
cooldown_on_rate_limitfloat60.0Rate limit cooldown (seconds)
cooldown_on_errorfloat30.0Error cooldown (seconds)
rotate_on_successboolFalseRotate profiles on success

Auth Profiles

Configure credentials for each provider:
from praisonaiagents import AuthProfile

profile = AuthProfile(
    name="openai-primary",
    provider="openai",
    api_key="sk-...",
    base_url=None,           # Custom endpoint (optional)
    priority=1,              # Lower = higher priority
    weight=1.0,              # For load balancing
    rate_limit=100,          # Requests per minute
    metadata={}              # Custom metadata
)
FieldTypeDescription
namestrUnique profile identifier
providerstrProvider: openai, anthropic, etc.
api_keystrAPI key (masked in logs)
base_urlstrCustom API endpoint
modelstrDefault model for this profile
priorityintFailover priority (lower = higher priority)
rate_limit_rpmintRequests per minute limit
rate_limit_tpmintTokens per minute limit
metadatadictAdditional provider-specific config

Common Patterns

from praisonaiagents import AuthProfile, FailoverManager

manager = FailoverManager()

# Add multiple providers
manager.add_profile(AuthProfile(
    name="openai",
    provider="openai",
    api_key="sk-...",
    priority=1
))

manager.add_profile(AuthProfile(
    name="anthropic",
    provider="anthropic", 
    api_key="sk-ant-...",
    priority=2
))

manager.add_profile(AuthProfile(
    name="groq",
    provider="groq",
    api_key="gsk-...",
    priority=3
))

Failover Callbacks

React to failover events:
from praisonaiagents import FailoverManager, FailoverConfig

def on_failover(from_profile, to_profile, error):
    print(f"Failing over from {from_profile} to {to_profile}")
    print(f"Reason: {error}")
    # Log to monitoring system
    
config = FailoverConfig(
    on_failover=on_failover
)

manager = FailoverManager(config)

Provider Status

Monitor provider health:
from praisonaiagents import FailoverManager

manager = FailoverManager()

# Get status of all providers
status = manager.status()
for name, info in status.items():
    print(f"{name}: {info['status']}")
    print(f"  Failures: {info['failure_count']}")
    print(f"  Last success: {info['last_success']}")

# Reset a provider after recovery
manager.mark_success("openai")

# Reset all profiles
manager.reset_all()

Best Practices

Always have at least 2-3 providers configured. This ensures availability even during major outages.
Enable exponential_backoff=True to avoid hammering providers during issues. This helps you stay within rate limits.
Order providers by cost and reliability. Put cheaper/faster providers first, with premium providers as fallback.
Use the on_failover callback to track when failovers occur. This helps identify provider issues early.

Tool Circuit Breaker

Automatic tool failure protection

Models

Supported LLM providers