LLM Error Classification

PraisonAI classifies every LLM failure into one of 11 typed categories so retries, failovers, and circuit breakers can react correctly without parsing error strings.

Quick Start

Simple Usage

Error classification happens automatically when using agents:

from praisonaiagents import Agent

agent = Agent(
    name="Smart Agent",
    instructions="Process user requests with automatic error handling"
)

# Classification happens transparently
result = agent.start("Generate a summary")

Read Error Category

Access the typed error category in error hooks:

from praisonaiagents import Agent
from praisonaiagents.errors import AgentErrorKind

def handle_error(error):
    # Access typed error category instead of parsing strings
    if error.error_category == "billing":
        print("💳 Quota exceeded - contact billing")
    elif error.error_category == "rate_limit":
        print("⏸️ Rate limited - will auto-retry")
    elif error.error_category == "auth_permanent":
        print("🔑 Invalid API key - check configuration")

agent = Agent(
    name="Smart Agent",
    instructions="Handle requests with error monitoring",
    on_error=handle_error
)

Custom Idle-Timeout Protection

Tune the idle-timeout circuit breaker for different scenarios:

from praisonaiagents import Agent
from praisonaiagents.errors import IdleTimeoutBreaker
from praisonaiagents.llm import LLM

# Custom breaker for slow self-hosted models
breaker = IdleTimeoutBreaker(max_consecutive=5)

llm = LLM(
    model="self-hosted/slow-model",
    idle_timeout_breaker=breaker,
    max_iter=15
)

agent = Agent(
    name="Patient Agent", 
    instructions="Work with slow models",
    llm=llm
)

How It Works

The classification system converts every LLM exception into a structured FailoverDecision that tells retry logic exactly what to do.

Error Categories

All LLM failures are classified into these 11 typed categories:

Kind	Triggers (examples)	Default Action	Retryable
`auth`	`unauthorized`, `api key`, `authentication failed`	`rotate_profile` (if failover enabled)	Yes
`auth_permanent`	`invalid api key`, `incorrect api key`	`surface_error`	No
`rate_limit`	`rate limit`, `429`, `resource_exhausted`	`retry` (with parsed/exponential backoff, max 60s)	Yes
`overloaded`	`503`, `502`, `500`, `service unavailable`	`retry` (2s→4s→8s, capped at 30s)	Yes
`context_overflow`	`maximum context length`, `context window is too long`	`surface_error`	No
`idle_timeout`	`timeout`, `timed out`, `deadline exceeded`	`retry` until breaker hits 3, then `surface_error`	Yes (until breaker)
`billing`	`insufficient quota`, `quota exceeded`, `payment required`	`surface_error`	No
`model_not_found`	`model not found`, `unknown model`	`surface_error`	No
`empty_response`	`empty response`, `json decode error`	`retry` (limited)	Limited
`format_error`	`validation error`, `invalid json`, `schema error`	`surface_error`	No
`unknown`	anything else	`retry` for attempt ≤ 2, then `surface_error`	Limited

FailoverDecision Structure

Every classification produces a FailoverDecision with these fields:

Field	Type	Description
`action`	`"retry"` \| `"rotate_profile"` \| `"surface_error"`	What action to take
`reason`	`AgentErrorKind`	The classified error type
`backoff_ms`	`int`	Milliseconds to wait before retry (0 = immediate)
`is_retryable`	`bool`	Whether this error is worth retrying

Idle-Timeout Circuit Breaker

The idle-timeout circuit breaker is separate from the per-tool circuit breaker. It protects against LLM provider stalls:

Default: Stops after 3 consecutive idle_timeout failures
Auto-resets: On any successful LLM call
Only triggered by: idle_timeout error kind (not other timeouts)

Choosing Between Options

Common Patterns

Log Every Classified Failure

from praisonaiagents import Agent
from praisonaiagents.errors import LLMError

def error_logger(error: LLMError):
    print(f"⚠️ LLM {error.error_category}: {error.message}")
    
    # Route specific error types
    if error.error_category == "billing":
        # Alert ops team
        send_alert("billing", error.message)
    elif error.error_category == "auth_permanent":
        # Alert dev team
        send_alert("config", error.message)

agent = Agent(
    name="Monitored Agent",
    instructions="Track all LLM failures",
    on_error=error_logger
)

Custom Breaker for Slow Models

from praisonaiagents import Agent
from praisonaiagents.errors import IdleTimeoutBreaker
from praisonaiagents.llm import LLM

# More patient with self-hosted models
slow_model_breaker = IdleTimeoutBreaker(max_consecutive=8)

agent = Agent(
    name="Self-Hosted Agent",
    llm=LLM(
        model="ollama/custom-model",
        idle_timeout_breaker=slow_model_breaker,
        timeout=120  # 2 minute timeout
    )
)

Gate Alerts by Error Type

from praisonaiagents import Agent

def smart_alerting(error):
    # Only alert on errors that need human intervention
    alert_worthy = [
        "auth_permanent", "model_not_found", 
        "context_overflow", "billing"
    ]
    
    if error.error_category in alert_worthy:
        send_slack_alert(f"🚨 {error.error_category}: {error.message}")
    else:
        # Just log retryable errors
        logger.info(f"Retryable {error.error_category} - auto-handling")

agent = Agent(
    name="Smart Alerting Agent",
    instructions="Only alert on actionable errors",
    on_error=smart_alerting
)

Legacy Migration

The old error_category string values still work but emit a DeprecationWarning. Update to the new typed categories for cleaner code.

Old `error_category`	New `AgentErrorKind`	Migration
`"tool"`	`"unknown"`	Update error classification logic
`"llm"`	`"unknown"`	More specific classification available
`"budget"`	`"billing"`	Direct replacement
`"validation"`	`"format_error"`	Direct replacement
`"network"`	`"unknown"`	Use specific network error kinds
`"handoff"`	`"unknown"`	Agent handoff errors are separate

Example migration:

# OLD (deprecated - emits warning)
if error.error_category == "budget":
    handle_budget_error()

# NEW (recommended)  
if error.error_category == "billing":
    handle_billing_error()

Best Practices

Treat Permanent Errors as Config Issues

Errors classified as auth_permanent, model_not_found, and context_overflow indicate configuration problems, not transient failures. Set up monitoring to catch these during development.

permanent_errors = ["auth_permanent", "model_not_found", "format_error"]
if error.error_category in permanent_errors:
    # Log as config issue, not service degradation
    config_logger.error(f"Config issue: {error.error_category}")

Tune Circuit Breaker by Model Speed

Fast cloud models can use the default max_consecutive=3. Slow self-hosted models should increase this to avoid premature circuit breaking.

# Fast cloud model (default)
fast_breaker = IdleTimeoutBreaker()  # max_consecutive=3

# Slow self-hosted model  
slow_breaker = IdleTimeoutBreaker(max_consecutive=8)

Use Typed Categories Over String Matching

Instead of parsing error messages, use the typed error_category field for reliable error handling.

# AVOID: String parsing
if "quota" in str(exception):
    handle_quota()

# PREFER: Typed classification
if error.error_category == "billing":
    handle_billing_issue()

Pair with Model Failover for Cross-Provider Resilience

Combine error classification with Model Failover to automatically switch providers on auth errors.

from praisonaiagents.failover import FailoverManager

failover = FailoverManager([
    AuthProfile(provider="openai", api_key="..."),
    AuthProfile(provider="anthropic", api_key="...")
])

agent = Agent(
    name="Resilient Agent",
    llm={"model": "gpt-4o", "failover_manager": failover}
)
# Auth errors will automatically rotate to Anthropic

Structured LLM Errors

Foundation error handling with LLMError structure

Model Failover

Cross-provider failover with FailoverManager

Tool Circuit Breaker

Per-tool circuit breaking for tool execution

Execution Config

Configure max_iter and other execution parameters

​Quick Start

​How It Works

​Error Categories

​FailoverDecision Structure

​Idle-Timeout Circuit Breaker

​Choosing Between Options

​Common Patterns

​Log Every Classified Failure

​Custom Breaker for Slow Models

​Gate Alerts by Error Type

​Legacy Migration

​Best Practices

​Related

Structured LLM Errors

Model Failover

Tool Circuit Breaker

Execution Config

Quick Start

How It Works

Error Categories

FailoverDecision Structure

Idle-Timeout Circuit Breaker

Choosing Between Options

Common Patterns

Log Every Classified Failure

Custom Breaker for Slow Models

Gate Alerts by Error Type

Legacy Migration

Best Practices

Related