Skip to main content
Tool retry automatically re-runs a failing tool with exponential backoff so transient errors don’t break your agent.

Quick Start

1

Enable with defaults

Enable retry for all tools with safe defaults:
from praisonaiagents import Agent
from praisonaiagents.tools.retry import RetryPolicy

agent = Agent(
    name="researcher",
    instructions="Research topics on the web",
    tools=[web_search],
    tool_retry_policy=RetryPolicy()  # 3 attempts, exponential backoff
)

agent.start("Find information about renewable energy")
2

Tune attempts and backoff

Configure specific retry behavior:
from praisonaiagents import Agent
from praisonaiagents.tools.retry import RetryPolicy

agent = Agent(
    name="api_agent", 
    instructions="Call external APIs",
    tools=[api_tool],
    tool_retry_policy=RetryPolicy(
        max_attempts=5,
        retry_on={"timeout", "rate_limit", "connection_error"},
        backoff_factor=2.0,
        initial_delay_ms=1000,
        jitter=True
    )
)
3

Override per tool

Different tools may need different retry strategies:
from praisonaiagents import Agent
from praisonaiagents.tools import tool
from praisonaiagents.tools.retry import RetryPolicy

@tool(retry_policy=RetryPolicy(max_attempts=5, backoff_factor=3.0))
def unreliable_api_call(query: str) -> str:
    """Call an unreliable external API."""
    # This tool gets aggressive retry policy
    return call_external_api(query)

agent = Agent(
    name="mixed_agent",
    tools=[local_tool, unreliable_api_call],
    tool_retry_policy=RetryPolicy(max_attempts=2)  # Default for other tools
)

How It Works

StepWhat happens
1Agent calls tool via _execute_tool_with_circuit_breaker (sync) or execute_tool_async (async)
2On error, _classify_error_type tags it: timeout, rate_limit, connection_error, or unknown
3_get_tool_retry_policy resolves the active policy (tool > agent > default)
4If policy.should_retry(error_type, attempt) is true, wait policy.get_delay_ms(attempt) ms and retry
5HookEvent.ON_RETRY fires before each retry with the new OnRetryInput fields

Precedence Ladder

Tool-level (highest priority):
@tool(retry_policy=RetryPolicy(max_attempts=5))
def flaky_api():
    pass
Agent-level (medium priority):
agent = Agent(tool_retry_policy=RetryPolicy(max_attempts=3))
Default (lowest priority):
# Uses RetryPolicy() defaults when no policy specified
agent = Agent(tools=[tool])

Choosing a Retry Policy


Configuration Options

OptionTypeDefaultDescription
max_attemptsint3Total attempts including the first try
initial_delay_msint1000Delay before first retry, in milliseconds
backoff_factorfloat2.0Multiplier applied to delay per attempt
retry_onset[str]{"timeout","rate_limit","connection_error"}Error types that trigger a retry
jitterboolFalseAdd randomized jitter to delays
jitter_factorfloat0.25Jitter range as fraction of delay (±25%)
max_delay_msint30000Maximum delay between retries
Non-retryable error types (always short-circuit):
  • approval_denied, permission_denied, approval_error, circuit_open
  • Python exceptions: ValueError, TypeError, AttributeError from tool code

Common Patterns

Per-tool override for unreliable API

@tool(retry_policy=RetryPolicy(
    max_attempts=5,
    backoff_factor=3.0,
    jitter=True,
    retry_on={"timeout", "rate_limit", "connection_error"}
))
def external_weather_api(location: str) -> str:
    """Get weather from external API - known to be flaky."""
    return requests.get(f"https://api.weather.com/current?q={location}").text

agent = Agent(
    name="weather_bot",
    tools=[external_weather_api, local_calculation],
    tool_retry_policy=RetryPolicy(max_attempts=2)  # Default for other tools
)

YAML configuration

agents:
  api_researcher:
    role: API Researcher
    instructions: "Research using external APIs"
    tools: [web_search, api_tool]
    tool_retry_policy:
      max_attempts: 4
      retry_on: [timeout, rate_limit]
      backoff_factor: 2.0
      jitter: true

CLI usage

praisonai \
  --tool-retry-attempts 5 \
  --tool-retry-delay 500 \
  --tool-retry-backoff 2.0 \
  --tool-retry-on "timeout,rate_limit" \
  "Research renewable energy trends"

Hook Integration

Monitor retry attempts with hooks:
from praisonaiagents.hooks import HookEvent, OnRetryInput, HookResult
from praisonaiagents.hooks.registry import registry

@registry.on(HookEvent.ON_RETRY)
def log_retry(event: OnRetryInput) -> HookResult:
    print(f"[retry] {event.tool_name} attempt {event.attempt}/{event.max_attempts} "
          f"after {event.delay_ms}ms — {event.error_type}: {event.error}")
    return HookResult.allow()

agent = Agent(
    name="monitored_agent",
    tools=[flaky_tool],
    tool_retry_policy=RetryPolicy(max_attempts=3)
)
Available fields on OnRetryInput:
  • tool_name: Name of the failing tool
  • attempt: Current attempt number (1-based)
  • max_attempts: Maximum attempts configured
  • delay_ms: Delay before this retry in milliseconds
  • error_type: Classified error type (timeout, rate_limit, etc.)
  • error: Original exception object

Best Practices

Large retry counts mask real failures. If a tool fails 10+ times, there’s likely a deeper issue that retrying won’t solve. Use monitoring instead.
Without jitter, multiple agents retrying simultaneously create a “thundering herd” that can overwhelm rate-limited services. Jitter spreads out retry attempts.
Don’t retry LLM tools on connection_error if every attempt costs money. Use specific error types that indicate transient failures.
expensive_llm_tool_policy = RetryPolicy(
    max_attempts=2,
    retry_on={"timeout"}  # Only timeout, not connection errors
)
Agent-level retry policy keeps configuration DRY. Only override at the tool level for genuinely special cases like unreliable third-party APIs.

Tool Configuration

Consolidated tool configuration with ToolConfig

Concurrency

Parallel tool execution and timeouts

Hooks

Monitor and intercept agent behavior

Hook Events

Complete reference of hook events