Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Benchmark CLI
The praisonai benchmark command provides comprehensive performance benchmarking across all PraisonAI execution paths, comparing them against the raw OpenAI SDK baseline.
Quick Start
# Quick comparison of key paths
praisonai benchmark compare "Hi"
# Full benchmark suite (all 8 paths)
praisonai benchmark profile "What is 2+2?"
# Benchmark specific paths
praisonai benchmark agent "Hi"
praisonai benchmark cli "Hi"
praisonai benchmark workflow "Hi"
praisonai benchmark litellm "Hi"
Commands
benchmark profile
Run the full benchmark suite across all execution paths.
praisonai benchmark profile "What is 2+2?" --iterations 3
Options:
--iterations, -n: Number of iterations per path (default: 3)
--format, -f: Output format: text or json (default: text)
--output, -o: Save results to file
Paths benchmarked:
- OpenAI SDK (baseline)
- PraisonAI Agent
- PraisonAI CLI
- PraisonAI CLI with profiling
- PraisonAI Workflow (single agent)
- PraisonAI Workflow (multi-agent)
- PraisonAI via LiteLLM
- LiteLLM standalone
benchmark compare
Quick comparison of key execution paths.
praisonai benchmark compare "Hi" --iterations 2
Compares: OpenAI SDK, PraisonAI Agent, PraisonAI CLI, LiteLLM standalone.
benchmark sdk
Benchmark OpenAI SDK only (baseline).
praisonai benchmark sdk "Hi" --iterations 3 --format json
benchmark agent
Benchmark PraisonAI Agent vs SDK baseline.
praisonai benchmark agent "Hi" --iterations 3
benchmark cli
Benchmark PraisonAI CLI vs SDK baseline.
praisonai benchmark cli "Hi" --iterations 3
benchmark workflow
Benchmark PraisonAI Workflow (single and multi-agent) vs SDK baseline.
praisonai benchmark workflow "Hi" --iterations 3
benchmark litellm
Benchmark LiteLLM paths vs SDK baseline.
praisonai benchmark litellm "Hi" --iterations 3
Text Output (Default)
======================================================================
## Master Comparison Table
+------------------------------+----------+----------+----------+----------+------------+
| Path | Import | Init | Network | Total | Δ SDK |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent | 373ms | 0ms | 808ms | 1182ms | -88ms |
| openai_sdk | 290ms | 40ms | 939ms | 1269ms | baseline |
+------------------------------+----------+----------+----------+----------+------------+
JSON Output
praisonai benchmark agent "Hi" --format json > results.json
{
"timestamp": "2026-01-02T06:14:46.182126Z",
"prompt": "Hi",
"iterations": 3,
"sdk_baseline_ms": 1269.0,
"results": {
"openai_sdk": {
"path_name": "openai_sdk",
"mean_total_ms": 1269.0,
"min_total_ms": 1185.0,
"max_total_ms": 1354.0,
"std_total_ms": 119.0,
"mean_import_ms": 290.0,
"mean_init_ms": 40.0,
"mean_network_ms": 939.0,
"cold_total_ms": 1354.0,
"warm_total_ms": 1185.0,
"delta_vs_sdk_ms": 0.0
}
}
}
Timeline Diagrams
Each benchmark path includes an ASCII timeline diagram showing execution phases:
ENTER ───────────────────────────────────────────────────► RESPONSE
│ import │init│ network │
│ 373ms │0ms│ 808ms │
└──────────────┴┴─────────────────────────────────┘
TOTAL: 1182ms
Variance Analysis
The benchmark includes statistical analysis:
+------------------------------+----------+----------+----------+----------+------------+
| Path | Mean | Min | Max | StdDev | Cold/Warm |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent | 1182ms | 1138ms | 1225ms | 62ms | 1138/1225 |
| openai_sdk | 1269ms | 1185ms | 1354ms | 119ms | 1354/1185 |
+------------------------------+----------+----------+----------+----------+------------+
Overhead Classification
The benchmark classifies overhead into categories:
- Unavoidable: Network latency, TLS handshake, provider response time
- Framework: praisonaiagents import, LiteLLM import/config
- CLI: Subprocess spawn, Python startup, argument parsing
- Profiling: cProfile overhead when
--profile enabled
Python API Usage
from praisonai.cli.features.benchmark import BenchmarkHandler
handler = BenchmarkHandler()
# Run full benchmark
report = handler.run_full_benchmark(
prompt="What is 2+2?",
iterations=3,
)
# Print report
handler.print_report(report)
# Get comparison table
print(handler.create_comparison_table(report))
# Get variance analysis
print(handler.create_variance_table(report))
# Export to JSON
import json
print(json.dumps(report.to_dict(), indent=2))
Example Script
#!/usr/bin/env python3
"""Benchmark PraisonAI Agent vs OpenAI SDK."""
from praisonai.cli.features.benchmark import BenchmarkHandler
handler = BenchmarkHandler()
# Benchmark agent vs SDK
report = handler.run_full_benchmark(
prompt="Explain Python in one sentence",
iterations=3,
paths=["openai_sdk", "praisonai_agent"],
)
# Show results
for name, result in report.results.items():
print(f"\n{name}: {result.mean_total_ms:.0f}ms (±{result.std_total_ms:.0f}ms)")
Deep Profiling (—deep)
The --deep flag enables comprehensive cProfile-based profiling, providing:
- Per-function timing with self time and cumulative time
- Call counts for each function
- Module breakdown by category (PraisonAI, Agent, Network, Third-party)
- Call graph data with caller/callee relationships
Deep Profile Output
## Deep Profile: Top Functions by Cumulative Time
--------------------------------------------------------------------------------
Function Calls Self (ms) Cumul (ms)
--------------------------------------------------------------------------------
start 1 0.03 875.69
chat 1 0.03 875.66
_chat_completion 1 0.02 875.57
create_completion 1 0.01 875.54
--------------------------------------------------------------------------------
## Module Breakdown (by cumulative time)
------------------------------------------------------------
PraisonAI Agent Modules:
.../praisonaiagents/agent/agent.py 2626.92ms
Network Modules:
.../openai/_base_client.py 1712.23ms
.../httpx/_client.py 855.20ms
## Call Graph: 1293 edges
Deep Profile JSON Schema
{
"results": {
"praisonai_agent": {
"functions": [
{
"name": "start",
"file": "/path/to/agent.py",
"line": 123,
"calls": 1,
"total_time_ms": 0.03,
"cumulative_time_ms": 875.69
}
],
"call_graph": {
"callers": {"func:file:line": ["caller1", "caller2"]},
"callees": {"func:file:line": ["callee1", "callee2"]},
"edge_count": 1293
},
"module_breakdown": {
"praisonai": [{"file": "...", "cumulative_ms": 100.0}],
"agent": [{"file": "...", "cumulative_ms": 200.0}],
"network": [{"file": "...", "cumulative_ms": 300.0}],
"third_party": [{"file": "...", "cumulative_ms": 50.0}]
}
}
}
}
Best Practices
- Run multiple iterations: Use at least 3 iterations for reliable statistics
- Account for cold starts: First run is typically slower due to imports
- Use consistent prompts: Same prompt across paths for fair comparison
- Check network variance: Network latency can vary significantly
- Save JSON results: Use
--format json for programmatic analysis
- Use —deep for debugging: Deep profiling adds overhead but provides function-level insights
See Also