Agent Caching

Caching improves performance and reduces costs by reusing previous responses and leveraging provider-specific prompt caching.

Quick Start

Enable Caching

from praisonaiagents import Agent

agent = Agent(
    name="Cached Agent",
    instructions="You answer questions",
    caching=True  # Enable response caching
)

With Configuration

from praisonaiagents import Agent, CachingConfig

agent = Agent(
    name="Optimized Agent",
    instructions="You process data efficiently",
    caching=CachingConfig(
        enabled=True,           # Response caching
        prompt_caching=True,    # Provider prompt caching
    )
)

Cache Types

Response Caching

Stores LLM responses locally for identical requests:

agent = Agent(
    instructions="You answer FAQs",
    caching=CachingConfig(enabled=True)
)

# First call - hits LLM
agent.chat("What is Python?")  # ~500ms

# Second call - returns cached
agent.chat("What is Python?")  # ~5ms

Prompt Caching

Uses provider-specific caching (Anthropic, OpenAI):

agent = Agent(
    instructions="You are an expert assistant...",  # Long system prompt
    caching=CachingConfig(prompt_caching=True)
)

# Provider caches the system prompt
# Subsequent calls reuse cached prompt tokens

Configuration Options

from praisonaiagents import CachingConfig

config = CachingConfig(
    enabled=True,          # Enable response caching
    prompt_caching=None,   # Provider prompt caching (None = auto)
)

Option	Type	Default	Description
`enabled`	`bool`	`True`	Enable response caching
`prompt_caching`	`bool`	`None`	Provider prompt caching (auto-detect)

Provider Support

Provider	Response Cache	Prompt Cache
OpenAI	✅ Local	✅ Native
Anthropic	✅ Local	✅ Native
Google	✅ Local	⚠️ Limited
Ollama	✅ Local	❌ N/A

Cache Benefits

Benefit	Impact
Speed	Cached responses return in milliseconds
Cost	Avoid repeated API charges
Rate Limits	Reduce API request count
Reliability	Work offline with cached data

When to Use Caching

✅ Enable Caching For

FAQ bots
Repeated queries
Static content generation
Development/testing

❌ Disable Caching For

Real-time data needs
Personalized responses
Time-sensitive content
Random/creative output

Cache Invalidation

Caches are invalidated when:

System prompt changes
Model changes
Temperature changes
Tools change

# These create different cache entries
agent.chat("Hello", temperature=0.0)  # Cache entry 1
agent.chat("Hello", temperature=0.7)  # Cache entry 2

Best Practices

Enable for FAQ-style agents

Agents that answer common questions benefit most from caching.

Use prompt caching for long system prompts

If your system prompt is large, enable prompt caching to reduce costs.

Disable for dynamic content

Don’t cache responses that should vary (time-sensitive, personalized).

Monitor cache hit rates

Track cache effectiveness to optimize your caching strategy.

Execution

Performance limits

Memory

Persistent storage

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Quick Start

Cache Types

Response Caching

Prompt Caching

Configuration Options

Provider Support

Cache Benefits

When to Use Caching

✅ Enable Caching For

❌ Disable Caching For

Cache Invalidation

Best Practices

Execution

Memory

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Quick Start

​Cache Types

​Response Caching

​Prompt Caching

​Configuration Options

​Provider Support

​Cache Benefits

​When to Use Caching

✅ Enable Caching For

❌ Disable Caching For

​Cache Invalidation

​Best Practices

​Related

Execution

Memory

Quick Start

Cache Types

Response Caching

Prompt Caching

Configuration Options

Provider Support

Cache Benefits

When to Use Caching

Cache Invalidation

Best Practices

Related