Quick Start
How It Works
| Component | Purpose | Behavior |
|---|---|---|
| Retry Jitter | Prevents thundering herd | Random delays for multi-agent rate limits |
| Workflow Timeout | Stops hung processes | Hard kill after specified seconds |
| Failure Policies | Controls error handling | Surface or swallow exceptions |
Retry Jitter (LLM Backoff)
Prevents multi-agent thundering herd when many agents hit rate limits at once.| Error category | Behavior | Floor | Cap |
|---|---|---|---|
RATE_LIMIT | exp backoff (×3) + full jitter | base_delay (default 1.0) | 60.0s |
TRANSIENT | exp backoff (×2) + full jitter | base_delay (default 1.0) | 30.0s |
CONTEXT_LIMIT | deterministic | 0.5s | 0.5s |
AUTH / INVALID_REQUEST / PERMANENT | no retry | — | 0 |
Workflow Timeout
Stop runaway sync workflows that previously ignoredworkflow_timeout.
workflow_cancelled is the read-only flag set when a timeout fires (useful for downstream callbacks).
Scope change: async already enforced this; sync now does too.
Task Failure Policies
By default, callback and memory exceptions are logged and swallowed. These flags surface them.| Param | Type | Default | Effect when True |
|---|---|---|---|
fail_on_callback_error | bool | False | Re-raises any exception thrown inside task.callback. |
fail_on_memory_error | bool | False | Re-raises memory-store failures (both inside and after the task). |
Common Patterns
Strict CI mode:Best Practices
Set workflow_timeout for any agent that calls external APIs
Set workflow_timeout for any agent that calls external APIs
Network calls can hang indefinitely. Always set a reasonable timeout:Use 60s for quick tasks, 300s for complex multi-step workflows.
Turn fail_on_callback_error=True in tests, leave False in prod
Turn fail_on_callback_error=True in tests, leave False in prod
Tests should surface bugs immediately, production should be resilient:
Don't catch jitter-related delays — let the SDK handle backoff
Don't catch jitter-related delays — let the SDK handle backoff
Inspect TaskOutput.non_fatal_errors in your monitoring pipeline
Inspect TaskOutput.non_fatal_errors in your monitoring pipeline
Non-fatal errors indicate potential issues that should be tracked:
Related
Task Configuration
Task parameters and configuration options
Process Execution
Workflow execution and management

