Quick Start
How It Works
| Phase | Trigger | What happens |
|---|---|---|
FITS | utilization < trigger_at | No action; messages pass through |
COMPACT_NEEDED | utilization ≥ trigger_at | Strategy runs against history |
TRUNCATE_TOOLS | Any tool output > 1000 chars + aggressive_tool_truncation=True | Tool outputs truncated head 300 / tail 200 with marker |
COMPACT_THEN_TRUNCATE | utilization ≥ 0.95 | Both compaction and tool truncation |
Choose Your Policy
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
trigger_at | float | 0.90 | Context utilization fraction that triggers compaction. Range [0.1, 0.99]. Must be > target_utilization. |
strategy | str | CompactionStrategy | "drop_oldest_tools" | One of "truncate", "summarise", "drop_oldest_tools", "sliding_window". |
preserve_last_n_turns | int | 5 | Conversation turns at the tail that compaction never touches. |
max_compaction_attempts | int | 2 | Maximum compaction passes per LLM call. |
target_utilization | float | 0.70 | Post-compaction utilization target. Range [0.1, 0.95]. |
aggressive_tool_truncation | bool | True | When True, tool outputs > 1000 chars get truncated to head 300 / tail 200. |
model_overrides | dict[str, dict] | None | None | Per-model overrides (e.g. {"gpt-4o-mini": {"trigger_at": 0.75}}). |
Presets
CONSERVATIVE_POLICY
BALANCED_POLICY (Default)
AGGRESSIVE_POLICY
Routes (CompactionRoute enum)
| Route | Value | Action |
|---|---|---|
FITS | "fits" | No action — context within budget |
COMPACT_NEEDED | "compact_needed" | Run strategy on history |
TRUNCATE_TOOLS | "truncate_tools" | Shrink tool outputs only |
COMPACT_THEN_TRUNCATE | "compact_then_truncate" | Both — last-resort recovery |
Strategies (CompactionStrategy enum)
| Strategy | Value | Description | Maps to Legacy |
|---|---|---|---|
TRUNCATE | "truncate" | Remove oldest messages | TRUNCATE |
SUMMARISE | "summarise" | LLM-based summarization of old messages | SUMMARIZE |
DROP_OLDEST_TOOLS | "drop_oldest_tools" | Remove old tool outputs first | PRUNE |
SLIDING_WINDOW | "sliding_window" | Keep recent messages only | SLIDING |
Tool Output Truncation
Whenaggressive_tool_truncation=True and any tool message content exceeds 1000 characters:
- Threshold: 1000 chars
- Keep: Head 300 chars + tail 200 chars
- Marker:
...[truncated N chars for context budget]...
aggressive_tool_truncation=False to disable this behavior.
Model Overrides
Usemodel_overrides to apply different settings per model:
YAML / dict configuration
Policies serialize viato_dict() / from_dict() for CLI/YAML support:
Deprecation Notice
Common Patterns
Use BALANCED for most agents
Use AGGRESSIVE for token-tight models
Per-model overrides for multi-model agents
Best Practices
Set trigger_at lower than your model's hard limit ratio
Set trigger_at lower than your model's hard limit ratio
The default 0.90 is fine for 128k models. For smaller context windows, consider lowering to 0.80-0.85 to ensure sufficient headroom.
Keep preserve_last_n_turns >= 3
Keep preserve_last_n_turns >= 3
This ensures the agent doesn’t lose the active sub-task or recent conversation context that’s critical for coherent responses.
Use model_overrides for mixed-model workflows
Use model_overrides for mixed-model workflows
If your agent swaps between cheap and large models, set different thresholds to optimize token usage for each model type.
aggressive_tool_truncation=True is the right default
aggressive_tool_truncation=True is the right default
For tool-heavy agents (code execution, web search, RAG), large tool outputs often contain redundant information. Truncation preserves the essential parts.
Related
Context Compaction
The reactive CompactionConfig system
Execution Config
Agent execution configuration options
LLM Context Compression
LLM-driven message-history compression
Intelligent Conversation Compaction
Smart conversation summarization

