AI Agents with Context
PraisonAI provides industry-leading context management with smart defaults, lazy loading, and 6 optimization strategies.| Feature | PraisonAI | LangChain | CrewAI | Agno |
|---|---|---|---|---|
| Smart Defaults | ✅ | ❌ | ❌ | ❌ |
| Lazy Loading (0ms) | ✅ | ❌ | ❌ | ❌ |
| 6 Strategies | ✅ | ❌ | ❌ | ❌ |
| Per-Tool Budgets | ✅ | ❌ | ❌ | ❌ |
| Session Deduplication | ✅ | ❌ | ❌ | ⚠️ |
| LLM Summarization | ✅ | ⚠️ | ❌ | ❌ |
Quick Start
What is Context?
Context is everything sent to the LLM in a single API call. It includes:System Prompt
Agent instructions, role, and goals (~2K tokens)
Chat History
User/assistant messages (variable)
Tool Schemas
Function definitions (~2K tokens)
Tool Outputs
Results from tool calls (~20K tokens)
Memory/RAG
Retrieved context (~4K tokens)
Output Reserve
Space for LLM response (~8-16K tokens)
How Context Flows
Single Agent Flow
Multi-Agent Flow
Optimization Strategies
When context exceeds the threshold (default 80%), the optimizer kicks in:| Strategy | How It Works | Best For |
|---|---|---|
truncate | Remove oldest messages | Simple chatbots |
sliding_window | Keep N recent messages | Long conversations |
prune_tools | Truncate old tool outputs | Tool-heavy agents |
summarize | LLM summarizes old context | Critical context |
smart | Combines all strategies | Production use |
non_destructive | Tag for exclusion (undo-able) | Audit trails |
Smart Strategy Flow
Overflow Handling
| Level | Usage | Action |
|---|---|---|
| Normal | < 70% | No action |
| Warning | 70-80% | Monitor |
| Critical | 80-90% | Auto-compact triggers |
| Emergency | 90-95% | Aggressive optimization |
| Overflow | > 95% | Emergency truncation |
Token Budgeting
The Context Budgeter allocates tokens across segments:Model Limits
| Model | Context | Output Reserve |
|---|---|---|
| gpt-4o | 128K | 16K |
| gpt-4o-mini | 128K | 16K |
| claude-3-opus | 200K | 8K |
| gemini-1.5-pro | 2M | 8K |
Per-Tool Budgets
Set different limits for different tools:Session Deduplication
Prevents duplicate content across agents in multi-agent workflows:Multi-Agent Policies
Control how context is shared between agents:| Mode | Description | Use Case |
|---|---|---|
NONE | No context shared | Independent agents |
SUMMARY | Summarized context | Reduce tokens |
FULL | Full context (bounded) | Continuity needed |
Context Monitoring
Real-time snapshots for debugging:Snapshot Output
Token Estimation
Fast offline token counting (no API calls):| Content Type | Accuracy |
|---|---|
| English text | ~90-95% |
| Code | ~85-90% |
| Non-ASCII | ~80-85% |
Fast Context (Code Search)
Rapid parallel code search for AI agents:| Feature | Value |
|---|---|
| Search Latency | 100-200ms |
| Cache Hit | < 1ms |
| Parallel Speedup | 2-5x |
CLI Commands
In-Session Commands
| Command | Description |
|---|---|
/context on | Enable monitoring |
/context off | Disable monitoring |
/context stats | Show token ledger |
/context dump | Write snapshot now |
/context compact | Force optimization |
Configuration Reference
ManagerConfig Options
| Option | Type | Default | Description |
|---|---|---|---|
auto_compact | bool | True | Auto-optimize on threshold |
compact_threshold | float | 0.8 | Trigger at this usage % |
strategy | str | "smart" | Optimization strategy |
output_reserve | int | Model-specific | Reserved for output |
llm_summarize | bool | False | Use LLM for summarization |
tool_limits | dict | {} | Per-tool token limits |
protected_tools | list | [] | Tools never pruned |
monitor_enabled | bool | False | Enable snapshots |
redact_sensitive | bool | True | Redact secrets |
Environment Variables
Best Practices
Enable for tool-heavy agents
Enable for tool-heavy agents
Always enable
context=True for agents with tools to prevent token overflow from large search results.Use smart strategy in production
Use smart strategy in production
The
smart strategy combines all optimization techniques intelligently.Set per-tool limits
Set per-tool limits
Configure lower limits for verbose tools (search, web scraping) and higher limits for code execution.
Monitor during development
Monitor during development
Enable
monitor_enabled=True to debug context issues and understand token usage.Use session deduplication
Use session deduplication
In multi-agent workflows, deduplication prevents the same content from being processed multiple times.
Related Pages
Memory
Persistent storage across sessions
Knowledge
Pre-loaded reference documents
Context vs Memory
When to use each system
Context vs Knowledge
Runtime vs pre-loaded data
Context management uses lazy loading throughout. Setting
context=True adds only 1 boolean assignment at creation time (0ms). The ContextManager is only instantiated when first accessed.
