Prompt Caching - PraisonAI Documentation

The --prompt-caching flag enables prompt caching to reduce costs when using repeated or long system prompts.

Quick Start

praisonai "Analyze this document..." --prompt-caching --llm anthropic/claude-sonnet-4-20250514

Usage

Basic Prompt Caching

praisonai "Analyze this document..." --prompt-caching --llm anthropic/claude-sonnet-4-20250514

Expected Output:

💾 Prompt Caching enabled

╭─ Agent Info ─────────────────────────────────────────────────────────────────╮
│  👤 Agent: DirectAgent                                                       │
│  Role: Assistant                                                             │
│  Model: anthropic/claude-sonnet-4-20250514                                          │
│  Prompt Caching: Enabled                                                     │
╰──────────────────────────────────────────────────────────────────────────────╯

╭─ Cache Status ───────────────────────────────────────────────────────────────╮
│  📊 Cache hit: System prompt (1,024 tokens saved)                           │
╰──────────────────────────────────────────────────────────────────────────────╯

Combine with Metrics

# See cost savings with metrics
praisonai "Process data..." --prompt-caching --metrics --llm anthropic/claude-sonnet-4-20250514

Supported Providers

Provider	Support	Notes
OpenAI	Auto	Automatic caching for repeated prompts
Anthropic	Manual	Explicit caching with `--prompt-caching`
Bedrock	Manual	Explicit caching support
Deepseek	Manual	Explicit caching support

How It Works

Enable: The --prompt-caching flag activates caching
Hash: System prompt is hashed for cache lookup
Check: Provider checks if prompt is cached
Reuse: Cached prompts skip re-processing
Save: Reduced token costs for cached portions

Cost Savings

Prompt caching can significantly reduce costs for:

Scenario	Savings
Long system prompts	Up to 90%
Repeated instructions	Up to 80%
Document analysis	Up to 70%
Multi-turn conversations	Up to 50%

Examples

Long System Prompt

# Agent with extensive instructions benefits from caching
praisonai "Answer questions about the codebase" \
  --prompt-caching --llm anthropic/claude-sonnet-4-20250514

Document Analysis

# Repeated analysis of same document
praisonai "Find security issues in this code..." \
  --prompt-caching --llm anthropic/claude-sonnet-4-20250514

Multi-Query Session

# Multiple queries with same context
praisonai "Query 1..." --prompt-caching --llm anthropic/claude-sonnet-4-20250514
praisonai "Query 2..." --prompt-caching --llm anthropic/claude-sonnet-4-20250514
praisonai "Query 3..." --prompt-caching --llm anthropic/claude-sonnet-4-20250514

Programmatic Usage

from praisonaiagents import Agent

agent = Agent(
    instructions="You are an AI assistant..." * 50,  # Long system prompt
    llm="anthropic/claude-sonnet-4-20250514",
    caching=True
)

# First call caches the prompt
result1 = agent.start("Question 1")

# Subsequent calls use cached prompt
result2 = agent.start("Question 2")  # Reduced cost
result3 = agent.start("Question 3")  # Reduced cost

Best Practices

Use prompt caching when you have long system prompts or make repeated calls with the same context.

Caching is most effective for stable prompts. Frequently changing prompts won’t benefit from caching.

Do	Don’t
Use for long system prompts	Use for short prompts
Use for repeated queries	Use for one-off queries
Combine with `--metrics` to track savings	Ignore cost monitoring
Use stable instructions	Change prompts frequently

Cache Behavior

Provider	Cache Duration	Cache Scope
OpenAI	Automatic	Per-request
Anthropic	5 minutes	Per-session
Bedrock	Configurable	Per-session
Deepseek	5 minutes	Per-session

CLI

​Quick Start

​Usage

​Basic Prompt Caching

​Combine with Metrics

​Supported Providers

​How It Works

​Cost Savings

​Examples

​Long System Prompt

​Document Analysis

​Multi-Query Session

​Programmatic Usage

​Best Practices

​Cache Behavior

​Related

Quick Start

Usage

Basic Prompt Caching

Combine with Metrics

Supported Providers

How It Works

Cost Savings

Examples

Long System Prompt

Document Analysis

Multi-Query Session

Programmatic Usage

Best Practices

Cache Behavior

Related