Benchmark CLI

The praisonai benchmark command provides comprehensive performance benchmarking across all PraisonAI execution paths, comparing them against the raw OpenAI SDK baseline.

Quick Start

# Quick comparison of key paths
praisonai benchmark compare "Hi"

# Full benchmark suite (all 8 paths)
praisonai benchmark profile "What is 2+2?"

# Benchmark specific paths
praisonai benchmark agent "Hi"
praisonai benchmark cli "Hi"
praisonai benchmark workflow "Hi"
praisonai benchmark litellm "Hi"

Commands

`benchmark profile`

Run the full benchmark suite across all execution paths.

praisonai benchmark profile "What is 2+2?" --iterations 3

Options:

--iterations, -n: Number of iterations per path (default: 3)
--format, -f: Output format: text or json (default: text)
--output, -o: Save results to file

Paths benchmarked:

OpenAI SDK (baseline)
PraisonAI Agent
PraisonAI CLI
PraisonAI CLI with profiling
PraisonAI Workflow (single agent)
PraisonAI Workflow (multi-agent)
PraisonAI via LiteLLM
LiteLLM standalone

`benchmark compare`

Quick comparison of key execution paths.

praisonai benchmark compare "Hi" --iterations 2

Compares: OpenAI SDK, PraisonAI Agent, PraisonAI CLI, LiteLLM standalone.

`benchmark sdk`

Benchmark OpenAI SDK only (baseline).

praisonai benchmark sdk "Hi" --iterations 3 --format json

`benchmark agent`

Benchmark PraisonAI Agent vs SDK baseline.

praisonai benchmark agent "Hi" --iterations 3

`benchmark cli`

Benchmark PraisonAI CLI vs SDK baseline.

praisonai benchmark cli "Hi" --iterations 3

`benchmark workflow`

Benchmark PraisonAI Workflow (single and multi-agent) vs SDK baseline.

praisonai benchmark workflow "Hi" --iterations 3

`benchmark litellm`

Benchmark LiteLLM paths vs SDK baseline.

praisonai benchmark litellm "Hi" --iterations 3

Output Formats

Text Output (Default)

======================================================================
## Master Comparison Table
+------------------------------+----------+----------+----------+----------+------------+
| Path                         |   Import |     Init |  Network |    Total |      Δ SDK |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent              |    373ms |      0ms |    808ms |   1182ms |      -88ms |
| openai_sdk                   |    290ms |     40ms |    939ms |   1269ms |   baseline |
+------------------------------+----------+----------+----------+----------+------------+

JSON Output

praisonai benchmark agent "Hi" --format json > results.json

{
  "timestamp": "2026-01-02T06:14:46.182126Z",
  "prompt": "Hi",
  "iterations": 3,
  "sdk_baseline_ms": 1269.0,
  "results": {
    "openai_sdk": {
      "path_name": "openai_sdk",
      "mean_total_ms": 1269.0,
      "min_total_ms": 1185.0,
      "max_total_ms": 1354.0,
      "std_total_ms": 119.0,
      "mean_import_ms": 290.0,
      "mean_init_ms": 40.0,
      "mean_network_ms": 939.0,
      "cold_total_ms": 1354.0,
      "warm_total_ms": 1185.0,
      "delta_vs_sdk_ms": 0.0
    }
  }
}

Timeline Diagrams

Each benchmark path includes an ASCII timeline diagram showing execution phases:

ENTER ───────────────────────────────────────────────────► RESPONSE
      │    import    │init│             network             │
      │    373ms     │0ms│              808ms              │
      └──────────────┴┴─────────────────────────────────┘
                                        TOTAL: 1182ms

Variance Analysis

The benchmark includes statistical analysis:

+------------------------------+----------+----------+----------+----------+------------+
| Path                         |     Mean |      Min |      Max |   StdDev |  Cold/Warm |
+------------------------------+----------+----------+----------+----------+------------+
| praisonai_agent              |   1182ms |   1138ms |   1225ms |     62ms |  1138/1225 |
| openai_sdk                   |   1269ms |   1185ms |   1354ms |    119ms |  1354/1185 |
+------------------------------+----------+----------+----------+----------+------------+

Overhead Classification

The benchmark classifies overhead into categories:

Unavoidable: Network latency, TLS handshake, provider response time
Framework: praisonaiagents import, LiteLLM import/config
CLI: Subprocess spawn, Python startup, argument parsing
Profiling: cProfile overhead when --profile enabled

Python API Usage

from praisonai.cli.features.benchmark import BenchmarkHandler

handler = BenchmarkHandler()

# Run full benchmark
report = handler.run_full_benchmark(
    prompt="What is 2+2?",
    iterations=3,
    
)

# Print report
handler.print_report(report)

# Get comparison table
print(handler.create_comparison_table(report))

# Get variance analysis
print(handler.create_variance_table(report))

# Export to JSON
import json
print(json.dumps(report.to_dict(), indent=2))

Example Script

#!/usr/bin/env python3
"""Benchmark PraisonAI Agent vs OpenAI SDK."""

from praisonai.cli.features.benchmark import BenchmarkHandler

handler = BenchmarkHandler()

# Benchmark agent vs SDK
report = handler.run_full_benchmark(
    prompt="Explain Python in one sentence",
    iterations=3,
    paths=["openai_sdk", "praisonai_agent"],
    
)

# Show results
for name, result in report.results.items():
    print(f"\n{name}: {result.mean_total_ms:.0f}ms (±{result.std_total_ms:.0f}ms)")

Deep Profiling (—deep)

The --deep flag enables comprehensive cProfile-based profiling, providing:

Per-function timing with self time and cumulative time
Call counts for each function
Module breakdown by category (PraisonAI, Agent, Network, Third-party)
Call graph data with caller/callee relationships

Deep Profile Output

## Deep Profile: Top Functions by Cumulative Time
--------------------------------------------------------------------------------
Function                                         Calls    Self (ms)   Cumul (ms)
--------------------------------------------------------------------------------
start                                                1         0.03       875.69
chat                                                 1         0.03       875.66
_chat_completion                                     1         0.02       875.57
create_completion                                    1         0.01       875.54
--------------------------------------------------------------------------------

## Module Breakdown (by cumulative time)
------------------------------------------------------------
PraisonAI Agent Modules:
  .../praisonaiagents/agent/agent.py                        2626.92ms

Network Modules:
  .../openai/_base_client.py                                1712.23ms
  .../httpx/_client.py                                       855.20ms

## Call Graph: 1293 edges

Deep Profile JSON Schema

{
  "results": {
    "praisonai_agent": {
      "functions": [
        {
          "name": "start",
          "file": "/path/to/agent.py",
          "line": 123,
          "calls": 1,
          "total_time_ms": 0.03,
          "cumulative_time_ms": 875.69
        }
      ],
      "call_graph": {
        "callers": {"func:file:line": ["caller1", "caller2"]},
        "callees": {"func:file:line": ["callee1", "callee2"]},
        "edge_count": 1293
      },
      "module_breakdown": {
        "praisonai": [{"file": "...", "cumulative_ms": 100.0}],
        "agent": [{"file": "...", "cumulative_ms": 200.0}],
        "network": [{"file": "...", "cumulative_ms": 300.0}],
        "third_party": [{"file": "...", "cumulative_ms": 50.0}]
      }
    }
  }
}

Best Practices

Run multiple iterations: Use at least 3 iterations for reliable statistics
Account for cold starts: First run is typically slower due to imports
Use consistent prompts: Same prompt across paths for fair comparison
Check network variance: Network latency can vary significantly
Save JSON results: Use --format json for programmatic analysis
Use —deep for debugging: Deep profiling adds overhead but provides function-level insights

CLI

​Benchmark CLI

​Quick Start

​Commands

​benchmark profile

​benchmark compare

​benchmark sdk

​benchmark agent

​benchmark cli

​benchmark workflow

​benchmark litellm

​Output Formats

​Text Output (Default)

​JSON Output

​Timeline Diagrams

​Variance Analysis

​Overhead Classification

​Python API Usage

​Example Script

​Deep Profiling (—deep)

​Deep Profile Output

​Deep Profile JSON Schema

​Best Practices

​See Also