Skip to main content

Knowledge Backends

PraisonAI supports multiple knowledge storage backends through a protocol-driven architecture. This allows you to choose the best backend for your use case while maintaining a consistent API.

Available Backends

BackendDescriptionBest For
mem0 (default)Long-term memory with semantic searchMulti-user apps, persistent memory
chromaLocal vector databaseDevelopment, single-user apps
internalBuilt-in lightweight storageSimple use cases

Agent-First Usage

The recommended way to use knowledge is through the Agent API:
from praisonaiagents import Agent

# Create agent with knowledge (uses mem0 by default)
agent = Agent(
    name="ResearchAssistant",
    instructions="You are a research assistant.",
    knowledge=["./documents/"],  # Add documents
    memory={"user_id": "user123"} ,  # Required for mem0 backend
)

# Chat automatically retrieves relevant context
response = agent.chat("What are the main findings?")

Scope Identifiers

Knowledge backends support three scope identifiers for multi-tenant isolation:
IdentifierPurposeExample
user_idIsolate per user"user_alice"
agent_idIsolate per agent type"research_agent_v1"
run_idIsolate per session"session_abc123"
The mem0 backend requires at least one scope identifier. If none is provided, operations will fail with a ScopeRequiredError.

Example with Scope

from praisonaiagents import Agent

# User-scoped knowledge
agent = Agent(
    name="PersonalAssistant",
    instructions="You are a personal assistant.",
    knowledge=["./user_docs/"],
    memory={"user_id": "alice"} ,  # Knowledge scoped to Alice
)

# Agent-scoped knowledge (shared across users)
shared_agent = Agent(
    name="CompanyBot",
    instructions="You answer company policy questions.",
    knowledge=["./policies/"],
    agent_id="company_bot_v1",  # Shared knowledge
)

Combining Multiple Scopes

Combine user_id, agent_id, and run_id to isolate knowledge down to a specific session for a specific agent and user.
from praisonaiagents import Agent

agent = Agent(
    name="SupportBot",
    instructions="Answer using the customer's session history.",
    knowledge=["./support_docs/"],
    user_id="customer_42",
    agent_id="support_bot_v1",
    run_id="session_2026_05_30",
)

agent.start("What did we discuss about my refund?")
You can also use the direct API for more control:
results = knowledge.search(
    "refund discussion",
    user_id="customer_42",
    agent_id="support_bot_v1",
    run_id="session_2026_05_30",
)
When you pass more than one scope identifier, PraisonAI automatically combines them using ChromaDB’s $and operator. A single identifier is passed through unchanged. You don’t need to write the $and yourself.
All provided identifiers are required to match (logical AND). Omit an identifier to broaden the scope on that dimension.
Multi-tenant SaaS application flow:
  • Per-customer isolation → set user_id
  • Per-agent isolation (e.g. SupportBot vs. SalesBot share infra but not data) → also set agent_id
  • Per-conversation isolation (e.g. ephemeral session memory) → also set run_id

Direct Knowledge API

For advanced use cases, you can use the Knowledge class directly:
from praisonaiagents.knowledge import Knowledge

# Initialize with config
knowledge = Knowledge(config={
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_docs",
            "path": "./.praison/knowledge/my_docs",
        }
    }
})

# Add documents
knowledge.add("./documents/", memory={"user_id": "user123"})

# Search
results = knowledge.search("query", user_id="user123", limit=10)

Normalization Guarantees

PraisonAI normalizes all backend results to ensure consistent behavior:
  • metadata is ALWAYS a dict (never None)
  • text field is always present (mapped from memory for mem0)
  • score is always a float (defaults to 0.0)
This means you can safely access metadata without null checks:
# Safe - metadata is guaranteed to be a dict
for result in results['results']:
    source = result.get('metadata', {}).get('source', 'unknown')
    # This works even if the backend returns metadata=None

Protocol-Driven Architecture

All backends implement the KnowledgeStoreProtocol:
from praisonaiagents.knowledge import KnowledgeStoreProtocol

class MyCustomBackend:
    """Custom backend implementing the protocol."""
    
    def search(self, query, *, user_id=None, agent_id=None, run_id=None, **kwargs):
        # Your implementation
        pass
    
    def add(self, content, *, user_id=None, agent_id=None, run_id=None, **kwargs):
        # Your implementation
        pass
    
    # ... other methods

Configuration Options

mem0 Backend (Default)

config = {
    "vector_store": {
        "provider": "qdrant",  # mem0 uses qdrant by default
        "config": {
            "collection_name": "my_collection",
        }
    }
}

Chroma Backend

config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_collection",
            "path": "./.praison/knowledge/my_collection",
        }
    }
}

Error Handling

from praisonaiagents.knowledge import (
    ScopeRequiredError,
    BackendNotAvailableError,
)

try:
    results = knowledge.search("query")  # Missing scope!
except ScopeRequiredError as e:
    print(f"Please provide user_id, agent_id, or run_id: {e}")
except BackendNotAvailableError as e:
    print(f"Backend not available: {e}")

Collection Naming Rules

Enhanced Security (PR #1597): Knowledge stores now validate collection names to prevent SQL injection attacks.
Knowledge stores that interpolate collection names into DDL/DML now require collection names to match ^[A-Za-z0-9_]+$. Affected backends:
  • Cassandra
  • pgvector
  • SingleStore vector
Invalid names raise: ValueError("collection_name must be non-empty and contain only alphanumerics and underscores") Valid examples:
  • my_collection
  • UserData123
  • agent_v2_docs
Invalid examples:
  • my-collection (contains hyphen)
  • user.docs (contains dot)
  • data collection (contains space)
  • ../../etc (path traversal attempt)

Best Practices

  1. Always provide scope identifiers for mem0 backend
  2. Use user_id for user-specific data (multi-tenant apps)
  3. Use agent_id for shared agent knowledge (company policies, FAQs)
  4. Use run_id for ephemeral session data (conversation context)
  5. Prefer Agent API over direct Knowledge API for most use cases
  6. Use alphanumeric collection names to ensure compatibility across all backends