Skip to main content
PraisonAI classifies every LLM failure into one of 11 typed categories so retries, failovers, and circuit breakers can react correctly without parsing error strings.

Quick Start

1

Simple Usage

Error classification happens automatically when using agents:
from praisonaiagents import Agent

agent = Agent(
    name="Smart Agent",
    instructions="Process user requests with automatic error handling"
)

# Classification happens transparently
result = agent.start("Generate a summary")
2

Read Error Category

Access the typed error category in error hooks:
from praisonaiagents import Agent
from praisonaiagents.errors import AgentErrorKind

def handle_error(error):
    # Access typed error category instead of parsing strings
    if error.error_category == "billing":
        print("💳 Quota exceeded - contact billing")
    elif error.error_category == "rate_limit":
        print("⏸️ Rate limited - will auto-retry")
    elif error.error_category == "auth_permanent":
        print("🔑 Invalid API key - check configuration")

agent = Agent(
    name="Smart Agent",
    instructions="Handle requests with error monitoring",
    on_error=handle_error
)
3

Custom Idle-Timeout Protection

Tune the idle-timeout circuit breaker for different scenarios:
from praisonaiagents import Agent
from praisonaiagents.errors import IdleTimeoutBreaker
from praisonaiagents.llm import LLM

# Custom breaker for slow self-hosted models
breaker = IdleTimeoutBreaker(max_consecutive=5)

llm = LLM(
    model="self-hosted/slow-model",
    idle_timeout_breaker=breaker,
    max_iter=15
)

agent = Agent(
    name="Patient Agent", 
    instructions="Work with slow models",
    llm=llm
)

How It Works

The classification system converts every LLM exception into a structured FailoverDecision that tells retry logic exactly what to do.

Error Categories

All LLM failures are classified into these 11 typed categories:
KindTriggers (examples)Default ActionRetryable
authunauthorized, api key, authentication failedrotate_profile (if failover enabled)Yes
auth_permanentinvalid api key, incorrect api keysurface_errorNo
rate_limitrate limit, 429, resource_exhaustedretry (with parsed/exponential backoff, max 60s)Yes
overloaded503, 502, 500, service unavailableretry (2s→4s→8s, capped at 30s)Yes
context_overflowmaximum context length, context window is too longsurface_errorNo
idle_timeouttimeout, timed out, deadline exceededretry until breaker hits 3, then surface_errorYes (until breaker)
billinginsufficient quota, quota exceeded, payment requiredsurface_errorNo
model_not_foundmodel not found, unknown modelsurface_errorNo
empty_responseempty response, json decode errorretry (limited)Limited
format_errorvalidation error, invalid json, schema errorsurface_errorNo
unknownanything elseretry for attempt ≤ 2, then surface_errorLimited

FailoverDecision Structure

Every classification produces a FailoverDecision with these fields:
FieldTypeDescription
action"retry" | "rotate_profile" | "surface_error"What action to take
reasonAgentErrorKindThe classified error type
backoff_msintMilliseconds to wait before retry (0 = immediate)
is_retryableboolWhether this error is worth retrying

Idle-Timeout Circuit Breaker

The idle-timeout circuit breaker is separate from the per-tool circuit breaker. It protects against LLM provider stalls:
  • Default: Stops after 3 consecutive idle_timeout failures
  • Auto-resets: On any successful LLM call
  • Only triggered by: idle_timeout error kind (not other timeouts)

Choosing Between Options

Common Patterns

Log Every Classified Failure

from praisonaiagents import Agent
from praisonaiagents.errors import LLMError

def error_logger(error: LLMError):
    print(f"⚠️ LLM {error.error_category}: {error.message}")
    
    # Route specific error types
    if error.error_category == "billing":
        # Alert ops team
        send_alert("billing", error.message)
    elif error.error_category == "auth_permanent":
        # Alert dev team
        send_alert("config", error.message)

agent = Agent(
    name="Monitored Agent",
    instructions="Track all LLM failures",
    on_error=error_logger
)

Custom Breaker for Slow Models

from praisonaiagents import Agent
from praisonaiagents.errors import IdleTimeoutBreaker
from praisonaiagents.llm import LLM

# More patient with self-hosted models
slow_model_breaker = IdleTimeoutBreaker(max_consecutive=8)

agent = Agent(
    name="Self-Hosted Agent",
    llm=LLM(
        model="ollama/custom-model",
        idle_timeout_breaker=slow_model_breaker,
        timeout=120  # 2 minute timeout
    )
)

Gate Alerts by Error Type

from praisonaiagents import Agent

def smart_alerting(error):
    # Only alert on errors that need human intervention
    alert_worthy = [
        "auth_permanent", "model_not_found", 
        "context_overflow", "billing"
    ]
    
    if error.error_category in alert_worthy:
        send_slack_alert(f"🚨 {error.error_category}: {error.message}")
    else:
        # Just log retryable errors
        logger.info(f"Retryable {error.error_category} - auto-handling")

agent = Agent(
    name="Smart Alerting Agent",
    instructions="Only alert on actionable errors",
    on_error=smart_alerting
)

Legacy Migration

The old error_category string values still work but emit a DeprecationWarning. Update to the new typed categories for cleaner code.
Old error_categoryNew AgentErrorKindMigration
"tool""unknown"Update error classification logic
"llm""unknown"More specific classification available
"budget""billing"Direct replacement
"validation""format_error"Direct replacement
"network""unknown"Use specific network error kinds
"handoff""unknown"Agent handoff errors are separate
Example migration:
# OLD (deprecated - emits warning)
if error.error_category == "budget":
    handle_budget_error()

# NEW (recommended)  
if error.error_category == "billing":
    handle_billing_error()

Best Practices

Errors classified as auth_permanent, model_not_found, and context_overflow indicate configuration problems, not transient failures. Set up monitoring to catch these during development.
permanent_errors = ["auth_permanent", "model_not_found", "format_error"]
if error.error_category in permanent_errors:
    # Log as config issue, not service degradation
    config_logger.error(f"Config issue: {error.error_category}")
Fast cloud models can use the default max_consecutive=3. Slow self-hosted models should increase this to avoid premature circuit breaking.
# Fast cloud model (default)
fast_breaker = IdleTimeoutBreaker()  # max_consecutive=3

# Slow self-hosted model  
slow_breaker = IdleTimeoutBreaker(max_consecutive=8)
Instead of parsing error messages, use the typed error_category field for reliable error handling.
# AVOID: String parsing
if "quota" in str(exception):
    handle_quota()

# PREFER: Typed classification
if error.error_category == "billing":
    handle_billing_issue()
Combine error classification with Model Failover to automatically switch providers on auth errors.
from praisonaiagents.failover import FailoverManager

failover = FailoverManager([
    AuthProfile(provider="openai", api_key="..."),
    AuthProfile(provider="anthropic", api_key="...")
])

agent = Agent(
    name="Resilient Agent",
    llm={"model": "gpt-4o", "failover_manager": failover}
)
# Auth errors will automatically rotate to Anthropic

Structured LLM Errors

Foundation error handling with LLMError structure

Model Failover

Cross-provider failover with FailoverManager

Tool Circuit Breaker

Per-tool circuit breaking for tool execution

Execution Config

Configure max_iter and other execution parameters