Benchmark CLI
Thepraisonai benchmark command provides comprehensive performance benchmarking across all PraisonAI execution paths, comparing them against the raw OpenAI SDK baseline.
Quick Start
Commands
benchmark profile
Run the full benchmark suite across all execution paths.
--iterations, -n: Number of iterations per path (default: 3)--format, -f: Output format:textorjson(default: text)--output, -o: Save results to file
- OpenAI SDK (baseline)
- PraisonAI Agent
- PraisonAI CLI
- PraisonAI CLI with profiling
- PraisonAI Workflow (single agent)
- PraisonAI Workflow (multi-agent)
- PraisonAI via LiteLLM
- LiteLLM standalone
benchmark compare
Quick comparison of key execution paths.
benchmark sdk
Benchmark OpenAI SDK only (baseline).
benchmark agent
Benchmark PraisonAI Agent vs SDK baseline.
benchmark cli
Benchmark PraisonAI CLI vs SDK baseline.
benchmark workflow
Benchmark PraisonAI Workflow (single and multi-agent) vs SDK baseline.
benchmark litellm
Benchmark LiteLLM paths vs SDK baseline.
Output Formats
Text Output (Default)
JSON Output
Timeline Diagrams
Each benchmark path includes an ASCII timeline diagram showing execution phases:Variance Analysis
The benchmark includes statistical analysis:Overhead Classification
The benchmark classifies overhead into categories:- Unavoidable: Network latency, TLS handshake, provider response time
- Framework: praisonaiagents import, LiteLLM import/config
- CLI: Subprocess spawn, Python startup, argument parsing
- Profiling: cProfile overhead when
--profileenabled
Python API Usage
Example Script
Deep Profiling (—deep)
The--deep flag enables comprehensive cProfile-based profiling, providing:
- Per-function timing with self time and cumulative time
- Call counts for each function
- Module breakdown by category (PraisonAI, Agent, Network, Third-party)
- Call graph data with caller/callee relationships
Deep Profile Output
Deep Profile JSON Schema
Best Practices
- Run multiple iterations: Use at least 3 iterations for reliable statistics
- Account for cold starts: First run is typically slower due to imports
- Use consistent prompts: Same prompt across paths for fair comparison
- Check network variance: Network latency can vary significantly
- Save JSON results: Use
--format jsonfor programmatic analysis - Use —deep for debugging: Deep profiling adds overhead but provides function-level insights
See Also
- Profile Command - Detailed function-level profiling
- Doctor Command - Health checks and diagnostics

