Skip to main content
Prompt caching optimizes LLM token costs by reusing stable context across conversation turns, paying full cost only once for unchanging prompt prefixes.

Quick Start

1

Use deterministic memory ordering

from praisonaiagents import Agent

agent = Agent(
    name="Researcher",
    instructions="Answer research questions using memory and tools.",
    memory=True,
)

# Memory results are sorted deterministically for cache consistency
agent.start("What did we learn last week about prompt caching?")
Memory search results maintain consistent ordering to improve cache hit rates on Anthropic/Google models.
2

Manual prompt assembly for caching

from praisonaiagents import Agent
from praisonaiagents.memory import Memory

memory = Memory(config={"provider": "rag"})

# Get consistent memory context
context = memory.build_context_for_task(
    task_descr="Summarise our research notes",
    user_id="user_123",
    max_items=3,
    include_in_output=True  # Force include for prompt assembly
)

# Construct cache-friendly prompt structure
system_prompt = f"System instructions and tools\n\n{context}\n\nUser message: {{user_input}}"

How It Works

ComponentCaching Behavior
Memory search resultsReturned in deterministic order based on content hashing and timestamps
Tool schemasConsistent ordering for reproducible prompts
Context structureStable system prompt + memory + dynamic user input
Cache effectivenessHigh hit rates when underlying data is unchanged

Configuration Options

The build_context_for_task() method provides these parameters for cache-optimized context:
ParameterTypeDefaultDescription
task_descrstrTask description for memory search
user_idOptional[str]NoneOptional user ID for personalised memory
additionalstr""Additional context to include in search
max_itemsint3Maximum items per memory category
include_in_outputOptional[bool]NoneWhether to include memory content in output (set to True for manual prompt assembly)
Returns: str — deterministically ordered context string combining relevant memories

Common Patterns

Multi-turn chat with memory

Just using memory=True is enough; deterministic ordering is automatic.
from praisonaiagents import Agent

agent = Agent(
    name="Assistant",
    instructions="Help with research tasks using memory.",
    memory=True,
)

# Each turn automatically gets cache-friendly prefixes
agent.start("Research prompt caching benefits")
agent.start("What are the cost savings?")  # Cache hit on Anthropic/Google

Manual prompt assembly for custom LLM calls

Use build_context_for_task() with explicit output control for manual prompt construction.
from praisonaiagents.memory import Memory

memory = Memory(config={"provider": "rag"})

context = memory.build_context_for_task(
    task_descr="Analyse user feedback",
    max_items=5,
    include_in_output=True  # Force memory content inclusion
)

# Construct cache-friendly prompt structure
system_prompt = f"""System: You are an AI assistant.

{context}

User: {user_input}"""

Tool schema consistency

Tools are ordered consistently to maintain prompt cache effectiveness across invocations.

Best Practices

Minimize memory updates during conversation turns to maintain cache consistency. Update memories at conversation boundaries.
Place stable system content (instructions, tools, static memory) before dynamic content (user messages, fresh data).
Varying max_items or other context parameters between turns changes the context and reduces cache effectiveness.
The framework automatically ensures consistent ordering for memory search results and tool schemas to optimize cache hits.

Advanced Memory

Memory configuration and search strategies

Stateful Agents

Building agents that maintain conversation state