Looking for the proactive policy-based system added in PR #1828? See Context Compaction Policy. This page documents the reactive
CompactionConfig system used inside Agent(context=...).Quick Start
Anti-Thrashing Protection
Prevents endless compaction cycles in long-running agents by tracking savings effectiveness and giving up when returns diminish.How Anti-Thrashing Works
- Savings Tracking: Each compaction calculates
(original_tokens - compacted_tokens) / original_tokens * 100 - Streak Counter: Increments when savings <
min_savings_pct, resets on good savings - Circuit Breaker: Stops compaction when
streak >= max_consecutive_low_savings - Reset Trigger: New messages arriving resets the protection state
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
min_savings_pct | float | 10.0 | Minimum savings percentage required (0-100) |
max_consecutive_low_savings | int | 2 | Max failed attempts before giving up |
min_savings_pct are auto-scaled (e.g., 0.15 becomes 15.0).
Iterative Summarisation
Builds upon previous summaries instead of starting fresh, preserving context across multiple compaction cycles.Iterative vs Fresh Summaries
| Mode | Behavior | Best For |
|---|---|---|
| Iterative (default) | Builds on previous summaries with [Previous Summary] → [New Activity] markers | Long research sessions, ongoing projects |
| Fresh | Summarizes entire conversation history each time | Short sessions, topic switches |
Tool-Result Pruning
Deduplicates and truncates verbose tool outputs before summarization, significantly reducing token waste.Custom Tool Pruner
Focused Summarisation
Biases summarization toward specific topics using thefocus_topic parameter, preserving relevant content while compacting the rest.
How Focus Topic Works
- Content Matching: Text matching the focus topic is preserved verbatim
- LLM Emphasis: When using LLM summarization, adds
Focus especially on: {focus_topic}. - Structured Paths: Marks focused content with
*FOCUS*markers in structured summaries
Focus Topic Use Cases
| Scenario | Focus Topic | Benefit |
|---|---|---|
| Code Review | "security vulnerabilities" | Preserves security discussions |
| Research Session | "performance benchmarks" | Keeps performance data intact |
| Planning Meeting | "delivery milestones" | Maintains timeline information |
Pluggable Protocols
Inject custom implementations for tool pruning, message formatting, and summary building through protocol interfaces.- Tool Result Pruner
- Message Formatter
- Summary Builder
Anti-Injection Framing
Prevents models from treating compacted summaries as active instructions by prepending safety framing.Default Anti-Injection Prefix
Custom Anti-Injection Framing
Summarize
Replace old messages with a summary:Smart
Intelligently select which messages to keep:LLM-Powered Summarization
LLM_SUMMARIZE uses the agent’s own LLM to summarise older turns, preserving identifiers, file paths, URLs, error messages, and the user’s intent verbatim.Fallback behavior: If the LLM call fails, fallback to naive summarization. If invoked from a sync context that’s already inside an event loop, it also falls back to naive — async callers (achat) get full LLM summarization.
Intelligent Conversation Compaction
New structured summarization that preserves conversation continuity:Compactor API
CLI Usage
Structured Summary Template
Organizes compacted content into clear sections instead of flat text.Template Structure
The structured template categorizes messages into six sections:- Active Task - Current user objective
- Completed Actions - Finished operations
- In Progress - Ongoing work
- Pending Questions - Unanswered queries
- Relevant Files / Paths - Mentioned file references
- Remaining Work - Planned future actions
Before/After Example
Before (Flat Summary):Disable Structured Template
Iterative Updates Across Multiple Compactions
Preserves context from previous compactions so long-running agents don’t lose early context.How Iterative Updates Work
- First compaction: Creates initial structured summary
- Second compaction: Merges previous summary with new content
- Subsequent compactions: Continue preserving essential context
Disable Iterative Updates
Configuration Options
Strategies Available
| Strategy | Value | Description |
|---|---|---|
TRUNCATE | "truncate" | Drop oldest messages (default, fastest). |
SLIDING | "sliding" | Sliding-window over recent messages. |
SUMMARIZE | "summarize" | Naive flat textual summary of older messages. |
SMART | "smart" | Heuristic selection of which messages to keep. |
LLM_SUMMARIZE | "llm_summarize" | New. Uses the agent’s LLM to produce a high-quality structured summary. |
PRUNE | "prune" | Removes old tool outputs while keeping the conversation. |
ExecutionConfig Options
| Option | Type | Default | Description |
|---|---|---|---|
context_compaction | bool | False | Enable automatic compaction of chat_history before each LLM call. Zero overhead when False. |
max_context_tokens | Optional[int] | None (auto-detect from model) | Token limit before compaction triggers. |
compaction_strategy | Optional[CompactionStrategy] | None (resolves to TRUNCATE) | Which strategy to use when compaction runs. |
CompactionConfig Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | True | Enable context compaction |
max_tokens | int | 8000 | Maximum tokens before compaction |
target_tokens | int | 6000 | Target tokens after compaction |
preserve_system | bool | True | Keep system messages |
preserve_recent | int | 5 | Keep last N messages |
auto_compact | bool | True | Automatically compact when needed |
compaction_prefix | str | COMPACTION_PREFIX | Anti-injection framing prepended to summaries |
structured_template | bool | True | Use organized section template for summaries |
iterative_update | bool | True | Merge previous summary on re-compaction |
min_savings_pct | float | 10.0 | Skip compaction if projected saving < N% (0–100 scale) |
max_consecutive_low_savings | int | 2 | Abort after N low-savings attempts (anti-thrashing) |
tool_prune_before_summarise | bool | True | Deduplicate tool results before summarisation |
max_tool_result_size | int | 500 | Max size for a single tool result before pruning |
enable_iterative_summary | bool | True | Build on previous summaries instead of starting fresh |
min_savings_pct values < 1.0 are auto-scaled (e.g., 0.15 becomes 15.0).
Two Ways to Configure Compaction
| Path | When to use |
|---|---|
Agent(context=True) / Agent(context=CompactionConfig(...)) | You want fine-grained control over the compaction algorithm itself (anti-injection prefix, structured template, iterative updates). |
Agent(execution=ExecutionConfig(context_compaction=True, compaction_strategy=...)) | You want simple, agent-centric enablement, especially for LLM-powered summarization. Recommended for LLM_SUMMARIZE. |
Choose Your Configuration
Inspecting Results
The newCompactionResult provides detailed metrics about compaction operations and their effectiveness.
New CompactionResult Fields
| Field | Type | Description |
|---|---|---|
savings_pct | float | Percentage of tokens saved (computed via calculate_savings_pct()) |
tool_results_pruned | int | Number of tool results that were pruned in the pre-pass |
previous_summary_reused | bool | True when iterative summary feature was used |
was_skipped_due_to_low_savings | bool | True when anti-thrashing protection aborted compaction |
Monitoring Compaction Health
User Interaction Flow
Real-world example showing how the new features work together in a long research session:How This Helps Long Research Sessions
- Hours 1-2: Agent builds initial knowledge about distributed systems
- Hours 3-4: Tool pruning keeps large documentation snippets manageable
- Hours 5-6: Focus topic preserves critical Raft algorithm details
- Hours 7+: Anti-thrashing prevents compaction overhead when context stabilizes
Best Practices
How do I tune anti-thrashing for my workload?
How do I tune anti-thrashing for my workload?
Adjust thresholds based on your agent’s usage pattern:For cost-sensitive workloads:For quality-focused workloads:Monitor
result.was_skipped_due_to_low_savings to see if protection is triggering.When should I write a custom ToolResultPrunerProtocol?
When should I write a custom ToolResultPrunerProtocol?
Write a custom tool pruner when:
- Your tools generate domain-specific outputs that need special handling
- Default size limits don’t match your tool output patterns
- You need to preserve specific data types (IDs, timestamps, etc.)
Iterative summaries vs. fresh summaries — which do I want?
Iterative summaries vs. fresh summaries — which do I want?
Use iterative summaries (default) when:
- Agent runs for hours/days with context continuity
- Research sessions with building knowledge
- Project management with evolving requirements
- Frequent topic switches in conversations
- Agent handles independent requests
- You prefer simpler mental models
What does focus_topic actually do?
What does focus_topic actually do?
Focus topic preserves content in three ways:
- Exact matches are preserved verbatim with
*FOCUS*markers - LLM summarization gets explicit instructions:
"Focus especially on: {focus_topic}." - Structured summaries emphasize focused content in relevant sections
- Long research sessions (“machine learning optimization”)
- Debugging sessions (“authentication errors”)
- Feature development (“payment integration”)
System-only overflow no longer hangs
System-only overflow no longer hangs
Since PR #1980,
_truncate() exits cleanly when only system messages remain over budget — previously this could loop indefinitely. The trade-off: when your system prompt alone exceeds target_tokens, post-compaction count may stay over target rather than dropping system messages.Best practices for long-running agents
Best practices for long-running agents
- Keep
enable_iterative_summary=True(default) for context preservation - Use
focus_topicwhen discussing specific technical areas - Monitor
result.tool_results_prunedto track tool output efficiency - Set appropriate
min_savings_pctbased on your cost tolerance - Use structured templates for better organization
- Test topic changes to verify anti-injection works properly
Hooks
BEFORE_COMPACTIONandAFTER_COMPACTIONhook events now fire consistently around every compaction (both sync and async). See Hooks.
Policy vs. CompactionConfig — which should I use?
ContextCompactionPolicy is the proactive gate that runs before LLM calls. CompactionConfig runs after when compaction is actually needed. Both are compatible —execution.context_compaction is the proactive gate, Agent(context=...) runs after.
Related
Serialization
Intelligent compaction vs. plain summarize
| Feature | Basic Summarize | Intelligent Compaction |
|---|---|---|
| Summary Structure | Simple text blob | Emoji-tagged sections (topic, goals, decisions) |
| Context Preservation | Basic content | Topic, progress, action items, preferences |
| Narrative Continuity | Limited | High - maintains conversation flow |
| Best For | General conversations | Long planning sessions, iterative work |
Zero Performance Impact
Compaction uses lazy loading:Memory Management
Long-term memory storage and retrieval
Agent Configuration
Complete agent configuration options

