Skip to main content
Praison automatically arranges your agent’s system prompt, memory, and tools so providers like Anthropic, OpenAI, Bedrock, and Deepseek can cache the stable parts — no code change needed.

Quick Start

1

Automatic Activation

Cache optimization is enabled automatically for supported models:
from praisonaiagents import Agent

agent = Agent(
    instructions="You are a helpful assistant",
    llm="anthropic/claude-sonnet-4-20250514",  # supports prompt caching
    memory=True,
)

agent.start("Summarize the latest report")
# Memory + tools laid out cache-friendly automatically
2

Explicit Configuration

Make cache optimization visible in your configuration:
from praisonaiagents import Agent, CachingConfig

agent = Agent(
    instructions="You are a helpful assistant", 
    llm="openai/gpt-4o",
    memory=True,
    caching=CachingConfig(prompt_caching=True)
)

agent.start("Analyze the data trends")

How It Works

Three Cache-Friendly Behaviors

BehaviorWhat It DoesImpact
Deterministic tool orderTools sorted by function nameAdding/reordering tools doesn’t break cache
Stable memory prefixMemory sections in fixed orderSame context = identical prefix = cache hit
Cache boundary markerSemantic split between stable/variableProviders optimize caching automatically

Supported Models

Cache optimization works automatically with these providers:
ProviderModelsCache TypeActivation
OpenAIgpt-4o, gpt-4-turbo, gpt-3.5-turboAutomatic (≥1024 tokens)Automatic
Anthropicclaude-sonnet-4, claude-opus-4, claude-3-5-*Manual with cache_controlManual + --prompt-caching
BedrockAll models supporting cachingManualManual
Deepseekdeepseek-chat, deepseek-coderAutomaticAutomatic
Check if your model supports caching:
from praisonaiagents.llm.model_capabilities import supports_prompt_caching

supports_prompt_caching("anthropic/claude-sonnet-4")  # True
supports_prompt_caching("openai/gpt-4o")             # True
supports_prompt_caching("ollama/llama3")            # False

Pipeline Integration

Cache optimization applies across all agent surfaces:
SurfaceMethodCache-Optimized
Single agentAgent._build_system_prompt✅ When model supports caching
SessionSession.get_context✅ Memory context optimized
API sessionAPISession.get_context✅ Memory context optimized
Multi-agent taskAgents._prepare_task_prompt✅ Task context optimized
Task chainsTask.execute_callback✅ Next-task context optimized

User Interaction Flow

Here’s a realistic customer support scenario showing cache optimization in action: What happens:
  1. First question: Memory context is empty, tools are sorted deterministically
  2. Second question: Memory grows but maintains stable order → cache hit on system prompt + memory prefix
  3. Third question: More memory added, but provider caches the stable portions → up to 90% cost reduction
Approximate savings:
  • Turn 1: Full cost (no cache)
  • Turn 2: 60-70% cost reduction
  • Turn 3+: 80-90% cost reduction
See detailed cost analysis in Prompt Caching CLI.

Best Practices

Cache optimization only works with models that support prompt caching. Use OpenAI (automatic), Anthropic (manual), Bedrock, or Deepseek.
# ✅ Good - supported model
agent = Agent(llm="openai/gpt-4o", memory=True)

# ❌ Won't cache - unsupported model  
agent = Agent(llm="ollama/llama3", memory=True)
Variable instructions break the cached prefix. Keep your system prompt consistent across turns.
# ✅ Good - stable instructions
agent = Agent(instructions="You are a helpful customer support agent")

# ❌ Bad - variable instructions
agent.instructions = f"You are helping {customer_name} at {timestamp}"
Memory activates the cache-optimized context path. Without memory, only tool sorting applies.
# ✅ Good - memory enables full optimization
agent = Agent(memory=True, llm="openai/gpt-4o")

# ❌ Limited - only tool sorting optimized
agent = Agent(memory=False, llm="openai/gpt-4o")
Don’t manually sort tool lists. The SDK sorts them deterministically by function name.
# ✅ Good - let SDK sort
tools = [search_tool, email_tool, calendar_tool]
agent = Agent(tools=tools)

# ❌ Bad - manual sorting breaks cache consistency
tools.sort(key=lambda t: t.__name__)

Agent Caching

Core caching concepts and configuration options

Prompt Caching CLI

Enable caching via command line with cost analysis