Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

Zep memory APIs provide both recent messages and compressed context summaries, enabling agents to access both short-term and long-term memory efficiently.

Quick Start

1

Install Zep Client

pip install zep-python
2

Basic Memory Retrieval

from praisonaiagents import Agent
from zep_python import ZepClient

# Initialize Zep client
zep_client = ZepClient(api_url="your-zep-url", api_key="your-api-key")

def get_memory_context(session_id: str, user_id: str):
    """Retrieve both messages and context from Zep"""
    # Get recent messages (authoritative for short-horizon recall)
    messages = zep_client.memory.get_messages(
        session_id=session_id,
        limit=10  # Last 10 turns
    )
    
    # Get context summary (compressed long-term memory)
    context = zep_client.memory.get_context(
        session_id=session_id,
        user_id=user_id
    )
    
    return messages, context

agent = Agent(
    name="Memory Assistant",
    instructions="Use both recent messages and long-term context effectively"
)
3

Dual-Read Strategy

def merge_memory_sources(session_id: str, user_id: str) -> str:
    """Recommended merge policy for Zep memory"""
    messages, context = get_memory_context(session_id, user_id)
    
    # Build deterministic timeline
    memory_prompt = "[system note: chronological messages follow]\n"
    
    # Add recent messages (never silently drop)
    for msg in messages:
        memory_prompt += f"<message author={msg.role}>{msg.content}</message>\n"
    
    # Add long-term context summary
    if context.summary:
        memory_prompt += "\n[system note: long-term distilled context]\n"
        memory_prompt += f"<summary>{context.summary}</summary>\n"
    
    return memory_prompt

How It Works

Memory TypePurposeFreshnessUse Case
MessagesShort-horizon recallReal-timeRecent conversations, immediate context
ContextLong-term memoryMay lag in cloudHistorical facts, user preferences

Agent Integration Patterns

from datetime import datetime, timedelta

def get_windowed_memory(session_id: str, hours: int = 24):
    """Get messages from specific time window + context"""
    cutoff = datetime.now() - timedelta(hours=hours)
    
    # Recent messages within time window
    messages = zep_client.memory.get_messages(
        session_id=session_id,
        created_after=cutoff
    )
    
    # Context for everything before window
    context = zep_client.memory.get_context(session_id=session_id)
    
    return messages, context

agent = Agent(
    name="Windowed Memory Agent",
    instructions="Use 24-hour message window with historical context"
)

Failure Modes & Solutions

Symptom: Agent forgets recent conversation turnsCause: Cloud Zep deployments may have lag between message ingestion and summary generationSolution:
def lag_resistant_memory(session_id: str):
    """Always prioritize raw messages over potentially stale context"""
    messages = zep_client.memory.get_messages(
        session_id=session_id,
        limit=20  # Larger window for safety
    )
    
    context = zep_client.memory.get_context(session_id=session_id)
    
    # Verify context freshness
    if messages and context.last_updated:
        latest_message_time = messages[0].created_at
        if context.last_updated < latest_message_time:
            # Context is stale, rely more on messages
            return messages, None
    
    return messages, context
Symptom: Hitting LLM context limits due to verbose memoryCause: Including both full message history and redundant contextSolution:
def compressed_memory(session_id: str, target_tokens: int = 1500):
    """Use sliding window + compressed remainder"""
    # Always include last 5 messages (critical recent context)
    recent_messages = zep_client.memory.get_messages(
        session_id=session_id,
        limit=5
    )
    
    # Use context for older history
    context = zep_client.memory.get_context(session_id=session_id)
    
    # Estimate and truncate if needed
    total_tokens = (
        sum(estimate_tokens(msg.content) for msg in recent_messages) +
        estimate_tokens(context.summary or "")
    )
    
    if total_tokens > target_tokens:
        # Reduce context summary length
        max_context_tokens = target_tokens - sum(estimate_tokens(msg.content) for msg in recent_messages)
        truncated_context = truncate_to_tokens(context.summary, max_context_tokens)
        context.summary = truncated_context
    
    return recent_messages, context
Symptom: Duplicate information from messages and context overlapCause: Context summary includes details already present in recent messagesSolution:
def non_redundant_memory(session_id: str):
    """Intelligent deduplication of memory sources"""
    messages = zep_client.memory.get_messages(session_id=session_id, limit=10)
    context = zep_client.memory.get_context(session_id=session_id)
    
    if not context.summary:
        return messages, context
    
    # Extract key topics from recent messages
    recent_topics = extract_key_topics([msg.content for msg in messages])
    
    # Filter context to exclude recently covered topics
    filtered_context = filter_context_by_topics(
        context.summary, 
        exclude_topics=recent_topics
    )
    
    context.summary = filtered_context
    return messages, context

def extract_key_topics(messages: list) -> set:
    """Extract key topics/entities from recent messages"""
    # Implementation depends on your NLP approach
    # Could use keyword extraction, NER, etc.
    pass

def filter_context_by_topics(summary: str, exclude_topics: set) -> str:
    """Remove redundant topics from context summary"""
    # Implementation depends on your filtering strategy
    pass

Configuration Options

OptionTypeDefaultDescription
api_urlstrRequiredZep server URL
api_keystrRequiredAuthentication key
session_idstrRequiredUnique session identifier
user_idstrOptionalUser identifier for context
message_limitint10Maximum recent messages to retrieve
context_window_hoursint24Time window for message retrieval

Best Practices

Choose the right strategy based on your use case:
  • Chat Applications: Use time window strategy (24-48 hours)
  • Task-Oriented Agents: Use token-limited strategy with higher message priority
  • Long-Running Sessions: Use smart deduplication to avoid redundancy
  • Real-Time Systems: Always fetch messages first, context as fallback
Implement robust fallbacks:
def robust_memory_retrieval(session_id: str):
    """Fail gracefully when Zep is unavailable"""
    try:
        messages = zep_client.memory.get_messages(session_id=session_id)
        context = zep_client.memory.get_context(session_id=session_id)
        return messages, context
    except Exception as e:
        logger.warning(f"Zep retrieval failed: {e}")
        # Fallback to local cache or simplified memory
        return get_fallback_memory(session_id)
Optimize for your deployment:
  • Batch Operations: Retrieve memory for multiple sessions at once
  • Caching: Cache context summaries that don’t change frequently
  • Async Operations: Use async Zep client for better throughput
  • Monitoring: Track summary lag and adjust strategies accordingly
Validate your memory strategy:
def test_memory_consistency(session_id: str):
    """Test that recent information isn't lost to summary lag"""
    # Add a test message
    test_content = f"Test message at {datetime.now()}"
    zep_client.memory.add_message(session_id, "user", test_content)
    
    # Immediately retrieve memory
    messages, context = get_memory_context(session_id, "test_user")
    
    # Verify test message is in recent messages
    recent_content = [msg.content for msg in messages]
    assert test_content in recent_content, "Recent message lost to summary lag"

Memory Systems

Core memory concepts and patterns

Agent Configuration

Agent setup and configuration