Quick Start
How It Works
Three Cache-Friendly Behaviors
| Behavior | What It Does | Impact |
|---|---|---|
| Deterministic tool order | Tools sorted by function name | Adding/reordering tools doesn’t break cache |
| Stable memory prefix | Memory sections in fixed order | Same context = identical prefix = cache hit |
| Cache boundary marker | Semantic split between stable/variable | Providers optimize caching automatically |
Supported Models
Cache optimization works automatically with these providers:| Provider | Models | Cache Type | Activation |
|---|---|---|---|
| OpenAI | gpt-4o, gpt-4-turbo, gpt-3.5-turbo | Automatic (≥1024 tokens) | Automatic |
| Anthropic | claude-sonnet-4, claude-opus-4, claude-3-5-* | Manual with cache_control | Manual + --prompt-caching |
| Bedrock | All models supporting caching | Manual | Manual |
| Deepseek | deepseek-chat, deepseek-coder | Automatic | Automatic |
Pipeline Integration
Cache optimization applies across all agent surfaces:| Surface | Method | Cache-Optimized |
|---|---|---|
| Single agent | Agent._build_system_prompt | ✅ When model supports caching |
| Session | Session.get_context | ✅ Memory context optimized |
| API session | APISession.get_context | ✅ Memory context optimized |
| Multi-agent task | Agents._prepare_task_prompt | ✅ Task context optimized |
| Task chains | Task.execute_callback | ✅ Next-task context optimized |
User Interaction Flow
Here’s a realistic customer support scenario showing cache optimization in action: What happens:- First question: Memory context is empty, tools are sorted deterministically
- Second question: Memory grows but maintains stable order → cache hit on system prompt + memory prefix
- Third question: More memory added, but provider caches the stable portions → up to 90% cost reduction
- Turn 1: Full cost (no cache)
- Turn 2: 60-70% cost reduction
- Turn 3+: 80-90% cost reduction
Best Practices
Use supported models
Use supported models
Cache optimization only works with models that support prompt caching. Use OpenAI (automatic), Anthropic (manual), Bedrock, or Deepseek.
Keep instructions stable
Keep instructions stable
Variable instructions break the cached prefix. Keep your system prompt consistent across turns.
Enable memory for optimization
Enable memory for optimization
Memory activates the cache-optimized context path. Without memory, only tool sorting applies.
Let the SDK sort tools
Let the SDK sort tools
Don’t manually sort tool lists. The SDK sorts them deterministically by function name.
Related
Agent Caching
Core caching concepts and configuration options
Prompt Caching CLI
Enable caching via command line with cost analysis

