Skip to main content

AI Agents with Context

PraisonAI provides industry-leading context management with smart defaults, lazy loading, and 6 optimization strategies.
FeaturePraisonAILangChainCrewAIAgno
Smart Defaults
Lazy Loading (0ms)
6 Strategies
Per-Tool Budgets
Session Deduplication⚠️
LLM Summarization⚠️

Quick Start

from praisonaiagents import Agent

# Enable with defaults (auto-enabled when tools present)
agent = Agent(
    instructions="You are helpful",
    context=True
)

What is Context?

Context is everything sent to the LLM in a single API call. It includes:

System Prompt

Agent instructions, role, and goals (~2K tokens)

Chat History

User/assistant messages (variable)

Tool Schemas

Function definitions (~2K tokens)

Tool Outputs

Results from tool calls (~20K tokens)

Memory/RAG

Retrieved context (~4K tokens)

Output Reserve

Space for LLM response (~8-16K tokens)

How Context Flows

Single Agent Flow

Multi-Agent Flow


Optimization Strategies

When context exceeds the threshold (default 80%), the optimizer kicks in:
StrategyHow It WorksBest For
truncateRemove oldest messagesSimple chatbots
sliding_windowKeep N recent messagesLong conversations
prune_toolsTruncate old tool outputsTool-heavy agents
summarizeLLM summarizes old contextCritical context
smartCombines all strategiesProduction use
non_destructiveTag for exclusion (undo-able)Audit trails

Smart Strategy Flow


Overflow Handling

LevelUsageAction
Normal< 70%No action
Warning70-80%Monitor
Critical80-90%Auto-compact triggers
Emergency90-95%Aggressive optimization
Overflow> 95%Emergency truncation

Token Budgeting

The Context Budgeter allocates tokens across segments:
from praisonaiagents import ContextBudgeter

budgeter = ContextBudgeter(model="gpt-4o-mini")
budget = budgeter.allocate()

print(f"Model limit: {budget.model_limit:,}")      # 128,000
print(f"Output reserve: {budget.output_reserve:,}") # 16,384
print(f"Usable: {budget.usable:,}")                 # 111,616

Model Limits

ModelContextOutput Reserve
gpt-4o128K16K
gpt-4o-mini128K16K
claude-3-opus200K8K
gemini-1.5-pro2M8K

Per-Tool Budgets

Set different limits for different tools:
from praisonaiagents import Agent, ManagerConfig

agent = Agent(
    instructions="You are helpful",
    context=ManagerConfig(
        tool_limits={
            "tavily_search": 2000,    # Search: 2K chars
            "tavily_extract": 5000,   # Full page: 5K chars
            "code_executor": 10000,   # Code output: 10K chars
        },
        protected_tools=["file_read"],  # Never pruned
    )
)

Session Deduplication

Prevents duplicate content across agents in multi-agent workflows:

Multi-Agent Policies

Control how context is shared between agents:
ModeDescriptionUse Case
NONENo context sharedIndependent agents
SUMMARYSummarized contextReduce tokens
FULLFull context (bounded)Continuity needed
from praisonaiagents import ContextPolicy, ContextShareMode

policy = ContextPolicy(
    share=True,
    share_mode=ContextShareMode.SUMMARY,
    max_tokens=5000,
    preserve_recent_turns=3,
)

Context Monitoring

Real-time snapshots for debugging:
from praisonaiagents import Agent, ManagerConfig

agent = Agent(
    instructions="You are helpful",
    context=ManagerConfig(
        monitor_enabled=True,
        monitor_path="./context.txt",
        monitor_format="human",  # or "json"
        redact_sensitive=True,
    )
)

Snapshot Output

================================================================================
PRAISONAI CONTEXT SNAPSHOT
================================================================================
Timestamp: 2026-01-24T06:00:00Z
Model: gpt-4o-mini
Model Limit: 128,000 tokens
Usable Budget: 111,616 tokens

--------------------------------------------------------------------------------
TOKEN LEDGER
--------------------------------------------------------------------------------
Segment              |     Tokens |     Budget |    Usage
--------------------------------------------------------------------------------
System Prompt        |        150 |      2,000 |    7.5%
History              |      5,230 |     84,616 |    6.2%
Tool Outputs         |      1,200 |     20,000 |    6.0%
--------------------------------------------------------------------------------
TOTAL                |      6,580 |    111,616 |    5.9%

Token Estimation

Fast offline token counting (no API calls):
from praisonaiagents import (
    estimate_tokens_heuristic,
    estimate_messages_tokens,
)

# Estimate text tokens
tokens = estimate_tokens_heuristic("Hello world!")  # ~3

# Estimate message tokens
messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi!"},
]
tokens = estimate_messages_tokens(messages)  # ~12
Content TypeAccuracy
English text~90-95%
Code~85-90%
Non-ASCII~80-85%

Rapid parallel code search for AI agents:
from praisonaiagents.context.fast import FastContext

fc = FastContext(
    workspace_path=".",
    max_turns=4,
    max_parallel=8,
)

result = fc.search("find authentication handlers")
print(f"Found {result.total_files} files in {result.search_time_ms}ms")
FeatureValue
Search Latency100-200ms
Cache Hit< 1ms
Parallel Speedup2-5x

CLI Commands

# Enable context in chat
praisonai chat --context

# Set strategy
praisonai chat --context-strategy smart

# Set threshold
praisonai chat --context-threshold 0.8

# Enable monitoring
praisonai chat --context-monitor

In-Session Commands

CommandDescription
/context onEnable monitoring
/context offDisable monitoring
/context statsShow token ledger
/context dumpWrite snapshot now
/context compactForce optimization

Configuration Reference

ManagerConfig Options

OptionTypeDefaultDescription
auto_compactboolTrueAuto-optimize on threshold
compact_thresholdfloat0.8Trigger at this usage %
strategystr"smart"Optimization strategy
output_reserveintModel-specificReserved for output
llm_summarizeboolFalseUse LLM for summarization
tool_limitsdict{}Per-tool token limits
protected_toolslist[]Tools never pruned
monitor_enabledboolFalseEnable snapshots
redact_sensitiveboolTrueRedact secrets

Environment Variables

PRAISONAI_CONTEXT_OUTPUT_RESERVE=8000
PRAISONAI_CONTEXT_THRESHOLD=0.8
PRAISONAI_CONTEXT_MONITOR=true

Best Practices

Always enable context=True for agents with tools to prevent token overflow from large search results.
The smart strategy combines all optimization techniques intelligently.
Configure lower limits for verbose tools (search, web scraping) and higher limits for code execution.
Enable monitor_enabled=True to debug context issues and understand token usage.
In multi-agent workflows, deduplication prevents the same content from being processed multiple times.

Context management uses lazy loading throughout. Setting context=True adds only 1 boolean assignment at creation time (0ms). The ContextManager is only instantiated when first accessed.