EvaluationLoop implements the “Ralph Loop” pattern: run an agent, judge the output, provide feedback, and repeat until the quality threshold is met.
How It Works
Quick Start
Agent.run_until()
EvaluationLoop Class
With Callback
from praisonaiagents import Agent
agent = Agent ( name = " analyzer " , instructions = " Analyze systems thoroughly " )
result = agent . run_until (
" Analyze the authentication flow " ,
criteria = " Analysis is thorough and actionable " ,
threshold = 8.0 ,
)
print ( f "Score: { result . final_score } /10" )
print ( f "Success: { result . success } " )
from praisonaiagents import Agent
from praisonaiagents . eval import EvaluationLoop
agent = Agent ( name = " writer " , instructions = " Write compelling content " )
loop = EvaluationLoop (
agent = agent ,
criteria = " Content is engaging and well-structured " ,
threshold = 8.0 ,
max_iterations = 5 ,
)
result = loop . run ( " Write a product description for an AI assistant " )
print ( result . final_report )
from praisonaiagents import Agent
from praisonaiagents . eval import EvaluationLoop
def on_iteration ( iteration_result ):
print ( f "Iteration { iteration_result . iteration } : { iteration_result . score } /10" )
agent = Agent ( name = " coder " , instructions = " Write clean code " )
result = agent . run_until (
" Write a function to validate email addresses " ,
criteria = " Code is correct, readable, and handles edge cases " ,
on_iteration = on_iteration ,
)
Modes
Optimize Mode Stops as soon as the threshold is met. Best for production use. loop = EvaluationLoop (
agent = agent ,
criteria = " ... " ,
mode = " optimize " # default
)
Review Mode Runs all iterations regardless of score. Best for analysis. loop = EvaluationLoop (
agent = agent ,
criteria = " ... " ,
mode = " review "
)
Configuration
The Agent instance to evaluate
Evaluation criteria for the Judge (e.g., “Response is thorough and accurate”)
Score threshold for success (1-10 scale)
Maximum number of iterations before stopping
"optimize" (stop on success) or "review" (run all iterations)
Optional callback called after each iteration with IterationResult
Results
EvaluationLoopResult
Whether the loop achieved the threshold
Score from the last iteration (1-10)
All scores across iterations
Output from the last iteration
All findings/suggestions collected
Number of iterations completed
result = agent . run_until ( " Analyze the codebase " , criteria = " ... " )
# Access results
print ( result . success ) # True
print ( result . final_score ) # 8.5
print ( result . score_history ) # [6.0, 7.2, 8.5]
print ( result . num_iterations ) # 3
print ( result . accumulated_findings ) # ["Consider edge cases", ...]
# Generate report
print ( result . final_report ) # Markdown report
# Serialize
print ( result . to_json ()) # JSON string
print ( result . to_dict ()) # Dictionary
IterationResult
Each iteration produces an IterationResult:
for iteration in result . iterations :
print ( f "Iteration { iteration . iteration } " )
print ( f " Score: { iteration . score } /10" )
print ( f " Reasoning: { iteration . reasoning } " )
print ( f " Findings: { iteration . findings } " )
print ( f " Output: { iteration . output [: 100 ] } ..." )
Async Support
import asyncio
from praisonaiagents import Agent
from praisonaiagents . eval import EvaluationLoop
async def main ():
agent = Agent ( name = " analyzer " , instructions = " Analyze systems " )
# Using EvaluationLoop directly
loop = EvaluationLoop ( agent = agent , criteria = " Analysis is thorough " )
result = await loop . run_async ( " Analyze the auth flow " )
# Or using Agent method
result = await agent . run_until_async (
" Analyze the auth flow " ,
criteria = " Analysis is thorough " ,
)
print ( result . final_score )
asyncio . run ( main ())
Best Practices
Be specific in your criteria to get consistent results: # ❌ Vague
criteria = " Response is good "
# ✅ Specific
criteria = " Response includes: 1) Clear problem statement, 2) Step-by-step solution, 3) Code examples "
Set Appropriate Thresholds
8.0 (default): Good for most use cases
9.0+ : High quality, may require more iterations
7.0 : Acceptable quality, faster completion
Limit Iterations for Cost Control
Each iteration makes LLM calls. Set max_iterations based on your budget: loop = EvaluationLoop (
agent = agent ,
criteria = " ... " ,
max_iterations = 3 , # Limit for cost control
)
Use Callbacks for Monitoring
Track progress in real-time: def on_iteration ( r ):
print ( f "[ { r . iteration } ] Score: { r . score } - { r . reasoning [: 50 ] } ..." )
if r . score < 6 :
print ( " ⚠️ Low score, may need more iterations " )
result = agent . run_until ( " ... " , criteria = " ... " , on_iteration = on_iteration )
EvaluationLoop uses lazy loading - the Judge is only imported when you actually run an evaluation, ensuring zero performance impact when not in use.