Skip to main content

Benchmark CLI

The praisonai benchmark command provides comprehensive performance benchmarking across all PraisonAI execution paths, comparing them against the raw OpenAI SDK baseline.

Quick Start

# Quick comparison of key paths
praisonai benchmark compare "Hi"

# Full benchmark suite (all 8 paths)
praisonai benchmark profile "What is 2+2?"

# Benchmark specific paths
praisonai benchmark agent "Hi"
praisonai benchmark cli "Hi"
praisonai benchmark workflow "Hi"
praisonai benchmark litellm "Hi"

Commands

benchmark profile

Run the full benchmark suite across all execution paths.
praisonai benchmark profile "What is 2+2?" --iterations 3
Options:
  • --iterations, -n: Number of iterations per path (default: 3)
  • --format, -f: Output format: text or json (default: text)
  • --output, -o: Save results to file
Paths benchmarked:
  1. OpenAI SDK (baseline)
  2. PraisonAI Agent
  3. PraisonAI CLI
  4. PraisonAI CLI with profiling
  5. PraisonAI Workflow (single agent)
  6. PraisonAI Workflow (multi-agent)
  7. PraisonAI via LiteLLM
  8. LiteLLM standalone

benchmark compare

Quick comparison of key execution paths.
praisonai benchmark compare "Hi" --iterations 2
Compares: OpenAI SDK, PraisonAI Agent, PraisonAI CLI, LiteLLM standalone.

benchmark sdk

Benchmark OpenAI SDK only (baseline).
praisonai benchmark sdk "Hi" --iterations 3 --format json

benchmark agent

Benchmark PraisonAI Agent vs SDK baseline.
praisonai benchmark agent "Hi" --iterations 3

benchmark cli

Benchmark PraisonAI CLI vs SDK baseline.
praisonai benchmark cli "Hi" --iterations 3

benchmark workflow

Benchmark PraisonAI Workflow (single and multi-agent) vs SDK baseline.
praisonai benchmark workflow "Hi" --iterations 3

benchmark litellm

Benchmark LiteLLM paths vs SDK baseline.
praisonai benchmark litellm "Hi" --iterations 3

Output Formats

Text Output (Default)

======================================================================
## Master Comparison Table
+------------------------------+----------+----------+----------+----------+------------+
| Path                         |   Import |     Init |  Network |    Total |      Δ SDK |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent              |    373ms |      0ms |    808ms |   1182ms |      -88ms |
| openai_sdk                   |    290ms |     40ms |    939ms |   1269ms |   baseline |
+------------------------------+----------+----------+----------+----------+------------+

JSON Output

praisonai benchmark agent "Hi" --format json > results.json
{
  "timestamp": "2026-01-02T06:14:46.182126Z",
  "prompt": "Hi",
  "iterations": 3,
  "sdk_baseline_ms": 1269.0,
  "results": {
    "openai_sdk": {
      "path_name": "openai_sdk",
      "mean_total_ms": 1269.0,
      "min_total_ms": 1185.0,
      "max_total_ms": 1354.0,
      "std_total_ms": 119.0,
      "mean_import_ms": 290.0,
      "mean_init_ms": 40.0,
      "mean_network_ms": 939.0,
      "cold_total_ms": 1354.0,
      "warm_total_ms": 1185.0,
      "delta_vs_sdk_ms": 0.0
    }
  }
}

Timeline Diagrams

Each benchmark path includes an ASCII timeline diagram showing execution phases:
ENTER ───────────────────────────────────────────────────► RESPONSE
      │    import    │init│             network             │
      │    373ms     │0ms│              808ms              │
      └──────────────┴┴─────────────────────────────────┘
                                        TOTAL: 1182ms

Variance Analysis

The benchmark includes statistical analysis:
+------------------------------+----------+----------+----------+----------+------------+
| Path                         |     Mean |      Min |      Max |   StdDev |  Cold/Warm |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent              |   1182ms |   1138ms |   1225ms |     62ms |  1138/1225 |
| openai_sdk                   |   1269ms |   1185ms |   1354ms |    119ms |  1354/1185 |
+------------------------------+----------+----------+----------+----------+------------+

Overhead Classification

The benchmark classifies overhead into categories:
  • Unavoidable: Network latency, TLS handshake, provider response time
  • Framework: praisonaiagents import, LiteLLM import/config
  • CLI: Subprocess spawn, Python startup, argument parsing
  • Profiling: cProfile overhead when --profile enabled

Python API Usage

from praisonai.cli.features.benchmark import BenchmarkHandler

handler = BenchmarkHandler()

# Run full benchmark
report = handler.run_full_benchmark(
    prompt="What is 2+2?",
    iterations=3,
    
)

# Print report
handler.print_report(report)

# Get comparison table
print(handler.create_comparison_table(report))

# Get variance analysis
print(handler.create_variance_table(report))

# Export to JSON
import json
print(json.dumps(report.to_dict(), indent=2))

Example Script

#!/usr/bin/env python3
"""Benchmark PraisonAI Agent vs OpenAI SDK."""

from praisonai.cli.features.benchmark import BenchmarkHandler

handler = BenchmarkHandler()

# Benchmark agent vs SDK
report = handler.run_full_benchmark(
    prompt="Explain Python in one sentence",
    iterations=3,
    paths=["openai_sdk", "praisonai_agent"],
    
)

# Show results
for name, result in report.results.items():
    print(f"\n{name}: {result.mean_total_ms:.0f}ms (±{result.std_total_ms:.0f}ms)")

Deep Profiling (—deep)

The --deep flag enables comprehensive cProfile-based profiling, providing:
  • Per-function timing with self time and cumulative time
  • Call counts for each function
  • Module breakdown by category (PraisonAI, Agent, Network, Third-party)
  • Call graph data with caller/callee relationships

Deep Profile Output

## Deep Profile: Top Functions by Cumulative Time
--------------------------------------------------------------------------------
Function                                         Calls    Self (ms)   Cumul (ms)
--------------------------------------------------------------------------------
start                                                1         0.03       875.69
chat                                                 1         0.03       875.66
_chat_completion                                     1         0.02       875.57
create_completion                                    1         0.01       875.54
--------------------------------------------------------------------------------

## Module Breakdown (by cumulative time)
------------------------------------------------------------
PraisonAI Agent Modules:
  .../praisonaiagents/agent/agent.py                        2626.92ms

Network Modules:
  .../openai/_base_client.py                                1712.23ms
  .../httpx/_client.py                                       855.20ms

## Call Graph: 1293 edges

Deep Profile JSON Schema

{
  "results": {
    "praisonai_agent": {
      "functions": [
        {
          "name": "start",
          "file": "/path/to/agent.py",
          "line": 123,
          "calls": 1,
          "total_time_ms": 0.03,
          "cumulative_time_ms": 875.69
        }
      ],
      "call_graph": {
        "callers": {"func:file:line": ["caller1", "caller2"]},
        "callees": {"func:file:line": ["callee1", "callee2"]},
        "edge_count": 1293
      },
      "module_breakdown": {
        "praisonai": [{"file": "...", "cumulative_ms": 100.0}],
        "agent": [{"file": "...", "cumulative_ms": 200.0}],
        "network": [{"file": "...", "cumulative_ms": 300.0}],
        "third_party": [{"file": "...", "cumulative_ms": 50.0}]
      }
    }
  }
}

Best Practices

  1. Run multiple iterations: Use at least 3 iterations for reliable statistics
  2. Account for cold starts: First run is typically slower due to imports
  3. Use consistent prompts: Same prompt across paths for fair comparison
  4. Check network variance: Network latency can vary significantly
  5. Save JSON results: Use --format json for programmatic analysis
  6. Use —deep for debugging: Deep profiling adds overhead but provides function-level insights

See Also