Skip to main content
Context Compaction Policy proactively routes long agent runs through token-budget checks before each LLM call, picking the right compaction strategy before context overflow happens.

Quick Start

1

Enable with defaults (one line)

from praisonaiagents import Agent, ExecutionConfig

agent = Agent(
    name="LongRunner",
    instructions="Handle long multi-turn research sessions.",
    execution=ExecutionConfig(context_compaction=True),
)

agent.start("Research the entire history of large language models...")
2

Pick a preset

from praisonaiagents import Agent, ExecutionConfig, CONSERVATIVE_POLICY

agent = Agent(
    name="CautiousAgent",
    execution=ExecutionConfig(context_compaction=CONSERVATIVE_POLICY),
)
3

Custom policy

from praisonaiagents import Agent, ExecutionConfig, ContextCompactionPolicy

agent = Agent(
    name="CustomAgent",
    execution=ExecutionConfig(
        context_compaction=ContextCompactionPolicy(
            trigger_at=0.85,
            strategy="summarise",
            preserve_last_n_turns=8,
            target_utilization=0.65,
        )
    ),
)

How It Works

PhaseTriggerWhat happens
FITSutilization < trigger_atNo action; messages pass through
COMPACT_NEEDEDutilization ≥ trigger_atStrategy runs against history
TRUNCATE_TOOLSAny tool output > 1000 chars + aggressive_tool_truncation=TrueTool outputs truncated head 300 / tail 200 with marker
COMPACT_THEN_TRUNCATEutilization ≥ 0.95Both compaction and tool truncation

Choose Your Policy


Configuration Options

OptionTypeDefaultDescription
trigger_atfloat0.90Context utilization fraction that triggers compaction. Range [0.1, 0.99]. Must be > target_utilization.
strategystr | CompactionStrategy"drop_oldest_tools"One of "truncate", "summarise", "drop_oldest_tools", "sliding_window".
preserve_last_n_turnsint5Conversation turns at the tail that compaction never touches.
max_compaction_attemptsint2Maximum compaction passes per LLM call.
target_utilizationfloat0.70Post-compaction utilization target. Range [0.1, 0.95].
aggressive_tool_truncationboolTrueWhen True, tool outputs > 1000 chars get truncated to head 300 / tail 200.
model_overridesdict[str, dict] | NoneNonePer-model overrides (e.g. {"gpt-4o-mini": {"trigger_at": 0.75}}).

Presets

CONSERVATIVE_POLICY

CONSERVATIVE_POLICY = ContextCompactionPolicyAdapter(
    trigger_at=0.80,
    strategy=CompactionStrategy.DROP_OLDEST_TOOLS,
    preserve_last_n_turns=8,
    target_utilization=0.60,
)
When to use: Short conversations, cheap models where early compaction doesn’t impact cost significantly.

BALANCED_POLICY (Default)

BALANCED_POLICY = ContextCompactionPolicyAdapter(
    trigger_at=0.90,
    strategy=CompactionStrategy.DROP_OLDEST_TOOLS,
    preserve_last_n_turns=5,
    target_utilization=0.70,
)
When to use: Most agents and standard use cases.

AGGRESSIVE_POLICY

AGGRESSIVE_POLICY = ContextCompactionPolicyAdapter(
    trigger_at=0.95,
    strategy=CompactionStrategy.SUMMARISE,
    preserve_last_n_turns=3,
    target_utilization=0.75,
    aggressive_tool_truncation=True,
)
When to use: Long tool-heavy loops or token-limited models where maximum context utilization is critical.

Routes (CompactionRoute enum)

RouteValueAction
FITS"fits"No action — context within budget
COMPACT_NEEDED"compact_needed"Run strategy on history
TRUNCATE_TOOLS"truncate_tools"Shrink tool outputs only
COMPACT_THEN_TRUNCATE"compact_then_truncate"Both — last-resort recovery

Strategies (CompactionStrategy enum)

StrategyValueDescriptionMaps to Legacy
TRUNCATE"truncate"Remove oldest messagesTRUNCATE
SUMMARISE"summarise"LLM-based summarization of old messagesSUMMARIZE
DROP_OLDEST_TOOLS"drop_oldest_tools"Remove old tool outputs firstPRUNE
SLIDING_WINDOW"sliding_window"Keep recent messages onlySLIDING

Tool Output Truncation

When aggressive_tool_truncation=True and any tool message content exceeds 1000 characters:
  • Threshold: 1000 chars
  • Keep: Head 300 chars + tail 200 chars
  • Marker: ...[truncated N chars for context budget]...
Set aggressive_tool_truncation=False to disable this behavior.

Model Overrides

Use model_overrides to apply different settings per model:
ContextCompactionPolicy(
    trigger_at=0.90,
    model_overrides={
        "gpt-4o-mini": {"trigger_at": 0.80, "target_utilization": 0.60},
        "claude-haiku-4-5": {"trigger_at": 0.85},
    },
)
The override values take precedence over the base configuration for the specified models.

YAML / dict configuration

Policies serialize via to_dict() / from_dict() for CLI/YAML support:
execution:
  context_compaction:
    trigger_at: 0.85
    strategy: drop_oldest_tools
    preserve_last_n_turns: 5
    target_utilization: 0.65
    aggressive_tool_truncation: true
Strategy accepts a plain string in dict/YAML form.

Deprecation Notice

ExecutionConfig.context_compaction will default to True in the next release for proactive context overflow protection. To disable, explicitly set context_compaction=False. To use the new default early, set context_compaction=True.Today the default is False — set it to True (or pass a policy) to opt in early.

Common Patterns

Use BALANCED for most agents

from praisonaiagents import Agent, ExecutionConfig

agent = Agent(
    name="StandardAgent",
    execution=ExecutionConfig(context_compaction=True),  # Uses BALANCED_POLICY
)

Use AGGRESSIVE for token-tight models

from praisonaiagents import Agent, ExecutionConfig, AGGRESSIVE_POLICY

agent = Agent(
    name="TokenTightAgent", 
    llm="gpt-4o-mini",
    execution=ExecutionConfig(context_compaction=AGGRESSIVE_POLICY),
)

Per-model overrides for multi-model agents

from praisonaiagents import Agent, ExecutionConfig, ContextCompactionPolicy

policy = ContextCompactionPolicy(
    trigger_at=0.90,
    model_overrides={
        "gpt-4o-mini": {"trigger_at": 0.75},
        "gpt-4o": {"trigger_at": 0.95},
    }
)

agent = Agent(
    name="MultiModelAgent",
    execution=ExecutionConfig(context_compaction=policy),
)

Best Practices

The default 0.90 is fine for 128k models. For smaller context windows, consider lowering to 0.80-0.85 to ensure sufficient headroom.
This ensures the agent doesn’t lose the active sub-task or recent conversation context that’s critical for coherent responses.
If your agent swaps between cheap and large models, set different thresholds to optimize token usage for each model type.
For tool-heavy agents (code execution, web search, RAG), large tool outputs often contain redundant information. Truncation preserves the essential parts.

Context Compaction

The reactive CompactionConfig system

Execution Config

Agent execution configuration options

LLM Context Compression

LLM-driven message-history compression

Intelligent Conversation Compaction

Smart conversation summarization