Skip to main content
Make your agents survive flaky LLMs, hung workflows, and broken callbacks.

Quick Start

1

Simple Usage

from praisonaiagents import Agent, Task, PraisonAIAgents

task = Task(
    description="Summarise the article",
    fail_on_callback_error=True,   # surface callback bugs instead of swallowing
    fail_on_memory_error=False,    # tolerate memory hiccups
)

workflow = PraisonAIAgents(
    agents=[Agent(name="Writer", instructions="Summarise clearly")],
    tasks=[task],
    workflow_timeout=120,           # hard kill after 2 min (sync + async)
)
workflow.start()
2

Production Configuration

from praisonaiagents import Agent, Task, PraisonAIAgents

# Strict mode for CI/testing
strict_task = Task(
    description="Validate the output",
    fail_on_callback_error=True,
    fail_on_memory_error=True,
)

# Lenient mode for production
lenient_task = Task(
    description="Generate content",
    fail_on_callback_error=False,  # default
    fail_on_memory_error=False,    # default
)

workflow = PraisonAIAgents(
    agents=[Agent(name="Validator", instructions="Check quality")],
    tasks=[strict_task, lenient_task],
    workflow_timeout=300,
)

result = workflow.start()
# Check for non-fatal errors in production
if result.non_fatal_errors:
    logger.warning(f"Non-fatal errors: {result.non_fatal_errors}")

How It Works

ComponentPurposeBehavior
Retry JitterPrevents thundering herdRandom delays for multi-agent rate limits
Workflow TimeoutStops hung processesHard kill after specified seconds
Failure PoliciesControls error handlingSurface or swallow exceptions

Retry Jitter (LLM Backoff)

Prevents multi-agent thundering herd when many agents hit rate limits at once.
Error categoryBehaviorFloorCap
RATE_LIMITexp backoff (×3) + full jitterbase_delay (default 1.0)60.0s
TRANSIENTexp backoff (×2) + full jitterbase_delay (default 1.0)30.0s
CONTEXT_LIMITdeterministic0.5s0.5s
AUTH / INVALID_REQUEST / PERMANENTno retry0
Jitter is automatic — there is no flag to turn it off.

Workflow Timeout

Stop runaway sync workflows that previously ignored workflow_timeout.
PraisonAIAgents(agents=[...], tasks=[...], workflow_timeout=60)
workflow_cancelled is the read-only flag set when a timeout fires (useful for downstream callbacks). Scope change: async already enforced this; sync now does too.

Task Failure Policies

By default, callback and memory exceptions are logged and swallowed. These flags surface them.
ParamTypeDefaultEffect when True
fail_on_callback_errorboolFalseRe-raises any exception thrown inside task.callback.
fail_on_memory_errorboolFalseRe-raises memory-store failures (both inside and after the task).
from praisonaiagents import Agent, Task

def buggy_callback(task_output):
    raise ValueError("This callback always fails!")

# This task will crash the workflow when callback fails
strict_task = Task(
    description="Process data",
    callback=buggy_callback,
    fail_on_callback_error=True,  # Surface the bug
)

# This task will log the error but continue
lenient_task = Task(
    description="Process data", 
    callback=buggy_callback,
    fail_on_callback_error=False,  # Swallow and log
)

# Check non-fatal errors after execution
result = agent.start(lenient_task)
print(f"Callback error: {result.callback_error}")
print(f"All non-fatal errors: {result.non_fatal_errors}")

Common Patterns

Strict CI mode:
task = Task(
    description="Validate output",
    fail_on_callback_error=True,
    fail_on_memory_error=True,
)
workflow = PraisonAIAgents(tasks=[task], workflow_timeout=60)
Lenient production mode:
task = Task(
    description="Generate content",
    fail_on_callback_error=False,
    fail_on_memory_error=False,
)
result = workflow.start()
if result.non_fatal_errors:
    metrics.increment("non_fatal_errors", tags={"task": task.name})
Multi-agent fan-out:
# Jitter automatically prevents thundering herd
agents = [Agent(name=f"Worker-{i}") for i in range(10)]
# All agents hitting same LLM get automatic jitter - no config needed

Best Practices

Network calls can hang indefinitely. Always set a reasonable timeout:
# Good: timeout prevents hung workflows
workflow = PraisonAIAgents(workflow_timeout=300)

# Bad: no timeout, workflow can hang forever
workflow = PraisonAIAgents()
Use 60s for quick tasks, 300s for complex multi-step workflows.
Tests should surface bugs immediately, production should be resilient:
# Test environment
if os.getenv("ENV") == "test":
    fail_on_callback_error = True
else:
    fail_on_callback_error = False

task = Task(
    description="Process data",
    fail_on_callback_error=fail_on_callback_error
)
Non-fatal errors indicate potential issues that should be tracked:
result = workflow.start()
for error in result.non_fatal_errors:
    logger.warning(f"Non-fatal error in {task.name}: {error}")
    metrics.increment("task.non_fatal_error", tags={
        "task": task.name,
        "error_type": type(error).__name__
    })

Task Configuration

Task parameters and configuration options

Process Execution

Workflow execution and management