Skip to main content
LLM Context Compression intelligently summarizes long conversation history while preserving the system prompt and recent context, with session lineage for traceability.

Quick Start

1

Simplest — enable via Agent

from praisonaiagents import Agent, ManagerConfig

agent = Agent(
    name="Researcher",
    instructions="Research topics in depth across many turns.",
    context=ManagerConfig(
        auto_compact=True,
        compact_threshold=0.8,
        strategy="summarize",  # uses LLM compression when available
        llm_summarize=True,
    ),
)

agent.start("Walk me through the entire history of AI safety research...")
2

Full control with LLMContextCompressorOptimizer

from praisonaiagents.context.optimizer import LLMContextCompressorOptimizer

optimizer = LLMContextCompressorOptimizer(
    llm_client=agent.llm,                # reuse the agent's LLM
    auxiliary_model="gpt-4o-mini",       # cheaper model for summarization
    protect_last_n_tokens=20_000,
    summary_target_tokens=750,
    enable_session_tracking=True,
)

optimized_messages, result = optimizer.optimize(messages, target_tokens=8_000)
print(f"Saved {result.tokens_saved} tokens — strategy: {result.strategy_used}")

How It Works

PhaseWhat happens
Head protectSystem prompt + first turns kept verbatim
Tail protectLast protect_last_n_tokens kept verbatim
Middle compressLLM call produces a summary_target_tokens summary
Session recordCompressionSession appended with parent/child link

Configuration Options

OptionTypeDefaultDescription
llm_clientLLM clientNoneProvider used for summarization (uses deterministic fallback if None)
auxiliary_modelstr"gpt-4o-mini"Model used for the summarization call (often a cheaper model than the agent’s main LLM)
protect_last_n_tokensint20_000Tokens to preserve at the tail (recent context)
summary_target_tokensint750Target tokens for the middle summary
enable_session_trackingboolTrueAppend CompressionSession entries for traceability
use_accurate_tokenizerboolTrueUse model-specific tokenizer; falls back to heuristic on import failure
The LLMContextCompressorOptimizer is exposed as LLM_CONTEXT_COMPRESSOR_OPTIMIZER and is not in OPTIMIZER_REGISTRY — users must instantiate it directly with an llm_client.

Session Lineage

Track compression history and audit trails across repeated compactions:
# Access session history
compressor = ContextCompressor(llm=agent.llm)
result = await compressor.compress(messages)

# View compression sessions
for session in compressor.get_session_history():
    print(f"Session {session.session_id[:8]}: {session.original_tokens}{session.compressed_tokens}")

# Chain sessions across compactions
next_result = await compressor.compress(
    result.messages,
    parent_session_id=result.session_id  # maintain audit trail
)
CompressionSession shape:
  • session_id: Unique identifier for this compression
  • parent_session_id: ID of previous compression for lineage
  • created_at: Timestamp
  • original_message_count / compressed_message_count: Message counts
  • original_tokens / compressed_tokens: Token counts
  • summary_text: The LLM-generated summary content

CompressResult

FieldTypeDescription
messagesList[Dict[str, Any]]Compressed message list (head + summary + tail)
tokens_savedintNumber of tokens removed
original_tokensintToken count before compression
final_tokensintToken count after compression
compression_ratiofloatFinal tokens / original tokens
session_idOptional[str]ID of this compression session
parent_session_idOptional[str]ID of parent compression session
summary_token_countintTokens used by the summary
head_preserved_countintNumber of head messages preserved
tail_preserved_countintNumber of tail messages preserved
middle_compressed_countintNumber of middle messages compressed
compression_efficiencyfloatPercentage of tokens saved (property)

Common Patterns

Use a cheap auxiliary model:
optimizer = LLMContextCompressorOptimizer(
    llm_client=agent.llm,           # Main agent uses gpt-4o
    auxiliary_model="gpt-4o-mini",  # Compression uses cheaper model
)
Tune for tool-heavy loops:
# Preserve more recent context when tools are frequently used
optimizer = LLMContextCompressorOptimizer(
    protect_last_n_tokens=30_000,  # Keep more recent tool results
    summary_target_tokens=1000,    # Slightly longer summaries
)
Chain sessions across compactions:
result1 = await compressor.compress(messages)
result2 = await compressor.compress(
    result1.messages,
    parent_session_id=result1.session_id  # maintain lineage
)

Best Practices

Use a cost-effective model like gpt-4o-mini for summarization while your main agent runs on gpt-4o or similar. This reduces costs without significantly impacting summary quality.
Keep summary_target_tokens at least 500 tokens. Summaries lose critical context below this threshold, leading to poor conversation continuity.
Set use_accurate_tokenizer=True for production deployments. This provides more accurate token budget calculations and better compression efficiency.
Monitor result.compression_efficiency to detect ineffective compactions. Values below 20% may indicate the conversation doesn’t benefit from compression.

Intelligent Compaction

Structured conversation summaries with topic/goal tracking

Context Optimizer

Overview of all optimization strategies including LLM compression

Context Strategies

Choosing the right optimization approach for your use case

Context Management

Complete guide to context window management features