Quick Start
1
Enable Caching
2
With Configuration
Cache Types
Response Caching
Stores LLM responses locally for identical requests:Prompt Caching
Uses provider-specific caching (Anthropic, OpenAI):Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | True | Enable response caching |
prompt_caching | bool | None | Provider prompt caching (auto-detect) |
Provider Support
| Provider | Response Cache | Prompt Cache |
|---|---|---|
| OpenAI | ✅ Local | ✅ Native |
| Anthropic | ✅ Local | ✅ Native |
| ✅ Local | ⚠️ Limited | |
| Ollama | ✅ Local | ❌ N/A |
Cache Benefits
| Benefit | Impact |
|---|---|
| Speed | Cached responses return in milliseconds |
| Cost | Avoid repeated API charges |
| Rate Limits | Reduce API request count |
| Reliability | Work offline with cached data |
When to Use Caching
✅ Enable Caching For
- FAQ bots
- Repeated queries
- Static content generation
- Development/testing
❌ Disable Caching For
- Real-time data needs
- Personalized responses
- Time-sensitive content
- Random/creative output
Cache Invalidation
Caches are invalidated when:- System prompt changes
- Model changes
- Temperature changes
- Tools change
Best Practices
Enable for FAQ-style agents
Enable for FAQ-style agents
Agents that answer common questions benefit most from caching.
Use prompt caching for long system prompts
Use prompt caching for long system prompts
If your system prompt is large, enable prompt caching to reduce costs.
Disable for dynamic content
Disable for dynamic content
Don’t cache responses that should vary (time-sensitive, personalized).
Monitor cache hit rates
Monitor cache hit rates
Track cache effectiveness to optimize your caching strategy.

