Skip to main content
Caching improves performance and reduces costs by reusing previous responses and leveraging provider-specific prompt caching.

Quick Start

1

Enable Caching

from praisonaiagents import Agent

agent = Agent(
    name="Cached Agent",
    instructions="You answer questions",
    caching=True  # Enable response caching
)
2

With Configuration

from praisonaiagents import Agent, CachingConfig

agent = Agent(
    name="Optimized Agent",
    instructions="You process data efficiently",
    caching=CachingConfig(
        enabled=True,           # Response caching
        prompt_caching=True,    # Provider prompt caching
    )
)

Cache Types

Response Caching

Stores LLM responses locally for identical requests:
agent = Agent(
    instructions="You answer FAQs",
    caching=CachingConfig(enabled=True)
)

# First call - hits LLM
agent.chat("What is Python?")  # ~500ms

# Second call - returns cached
agent.chat("What is Python?")  # ~5ms

Prompt Caching

Uses provider-specific caching (Anthropic, OpenAI):
agent = Agent(
    instructions="You are an expert assistant...",  # Long system prompt
    caching=CachingConfig(prompt_caching=True)
)

# Provider caches the system prompt
# Subsequent calls reuse cached prompt tokens

Configuration Options

from praisonaiagents import CachingConfig

config = CachingConfig(
    enabled=True,          # Enable response caching
    prompt_caching=None,   # Provider prompt caching (None = auto)
)
OptionTypeDefaultDescription
enabledboolTrueEnable response caching
prompt_cachingboolNoneProvider prompt caching (auto-detect)

Provider Support

ProviderResponse CachePrompt Cache
OpenAI✅ Local✅ Native
Anthropic✅ Local✅ Native
Google✅ Local⚠️ Limited
Ollama✅ Local❌ N/A

Cache Benefits

BenefitImpact
SpeedCached responses return in milliseconds
CostAvoid repeated API charges
Rate LimitsReduce API request count
ReliabilityWork offline with cached data

When to Use Caching

✅ Enable Caching For

  • FAQ bots
  • Repeated queries
  • Static content generation
  • Development/testing

❌ Disable Caching For

  • Real-time data needs
  • Personalized responses
  • Time-sensitive content
  • Random/creative output

Cache Invalidation

Caches are invalidated when:
  • System prompt changes
  • Model changes
  • Temperature changes
  • Tools change
# These create different cache entries
agent.chat("Hello", temperature=0.0)  # Cache entry 1
agent.chat("Hello", temperature=0.7)  # Cache entry 2

Best Practices

Agents that answer common questions benefit most from caching.
If your system prompt is large, enable prompt caching to reduce costs.
Don’t cache responses that should vary (time-sensitive, personalized).
Track cache effectiveness to optimize your caching strategy.