Knowledge Base System

The knowledge system provides sophisticated document processing and semantic search capabilities, enabling agents to access and utilise information from various sources.

Key Features

Process PDFs, documents, spreadsheets, images, and web content

Multiple strategies for optimal text segmentation

Vector-based search with optional reranking

User, agent, and run-specific knowledge scoping

Optional relationship extraction and storage

Automatic quality assessment for stored knowledge

Quick Start

from praisonaiagents import Agent

agent = Agent(
    name="Research Assistant",
    instructions="Answer questions using the knowledge base.",
    knowledge=["research_paper.pdf", "data.txt"],
    knowledge_config={
        "vector_store": {
            "provider": "chroma",
            "config": {
                "collection_name": "research_docs",
                "path": ".praison"
            }
        }
    }
)

response = agent.start("What are the key findings?")

Configuration Options

Basic Configuration

knowledge_config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "knowledge_base",
            "path": ".praison",
            "distance_metric": "cosine"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    }
}

Advanced Configuration with Graph Store

knowledge_config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "knowledge_base",
            "path": ".praison"
        }
    },
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": "bolt://localhost:7687",
            "username": "neo4j",
            "password": "password"
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini",
            "temperature": 0
        }
    },
    "reranker": {
        "enabled": True,
        "default_rerank": False
    }
}

Chunking Strategies

kb = Knowledge({
    "chunker": {
        "name": "token",
        "chunk_size": 500,
        "chunk_overlap": 50
    }
})

Document Processing

Supported File Types

  • PDF (.pdf)
  • Word (.doc, .docx)
  • Text (.txt)
  • Markdown (.md)
  • RTF (.rtf)

  • Excel (.xls, .xlsx)
  • CSV (.csv)
  • JSON (.json)
  • XML (.xml)

  • Images (OCR)
  • HTML pages
  • Web URLs
  • YouTube videos

Processing Options

# Add with metadata
kb.add(
    "research.pdf",
    user_id="user123",
    metadata={
        "category": "AI Research",
        "year": 2024,
        "author": "Dr. Smith"
    }
)

# Batch processing
documents = ["doc1.pdf", "doc2.txt", "doc3.md"]
for doc in documents:
    kb.add(doc, user_id="user123")

# URL processing
kb.add("https://arxiv.org/pdf/2301.00000.pdf", user_id="user123")

Search Features

# Simple search
results = kb.search("artificial intelligence", limit=5)

# User-scoped search
results = kb.search(
    query="machine learning",
    user_id="user123",
    limit=10
)

Advanced Search Options

# Enable Mem0 reranking for better relevance
results = kb.search(
    query="neural networks",
    user_id="user123",
    rerank=True,
    top_k=20  # Retrieve more before reranking
)

Memory Integration

When used with agents, knowledge automatically integrates with memory:

agent = Agent(
    name="Research Assistant",
    knowledge=["papers/"],  # Directory of papers
    knowledge_config=config,
    memory=True  # Enable memory integration
)

# Knowledge is automatically searched during conversations
response = agent.chat("What does the research say about transformers?")

Graph Store Features

Graph stores enable relationship extraction and complex queries beyond simple semantic search.

Configuration

knowledge_config = {
    "graph_store": {
        "provider": "neo4j",  # or "memgraph"
        "config": {
            "url": "bolt://localhost:7687",
            "username": "neo4j",
            "password": "password"
        }
    },
    "extract_relationships": True
}

Relationship Queries

# Find related concepts
results = kb.search_graph(
    "What concepts are related to transformers?",
    user_id="user123"
)

# Explore connections
results = kb.search_graph(
    "How is attention mechanism connected to BERT?",
    user_id="user123"
)

Best Practices

Chunking Strategy

  • Smaller chunks (100-200 tokens): Better precision
  • Larger chunks (500-1000 tokens): Better context
  • Match chunk size to query complexity

Organisation

  • Separate collections by domain
  • Use metadata for filtering
  • Regular cleanup of outdated content

Performance

  • Enable caching for repeated queries
  • Use appropriate embedding models
  • Batch document processing

Quality

  • Verify document processing
  • Monitor search relevance
  • Regular reindexing if needed

Example: Research Assistant

from praisonaiagents import Agent
from praisonaiagents.knowledge import Knowledge

# Configure knowledge base
knowledge_config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "research_papers",
            "path": "./knowledge_db"
        }
    },
    "chunker": {
        "name": "semantic",
        "threshold": 0.7
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    },
    "reranker": {
        "enabled": True
    }
}

# Create research assistant
research_agent = Agent(
    name="Research Assistant",
    instructions="""You are an expert research assistant.
    Use the knowledge base to provide accurate, well-sourced answers.
    Always cite the specific documents you reference.""",
    knowledge=[
        "papers/ai_safety.pdf",
        "papers/llm_alignment.pdf",
        "https://arxiv.org/pdf/2301.00234.pdf"
    ],
    knowledge_config=knowledge_config,
    knowledge_sources=["research_papers"],  # Named source
    markdown=True
)

# Use the assistant
response = research_agent.chat(
    "What are the main approaches to AI alignment?"
)

# Direct knowledge queries
kb = research_agent.knowledge_instance
papers = kb.search("alignment techniques", limit=5)
for paper in papers:
    print(f"- {paper['text'][:100]}...")
    print(f"  Source: {paper.get('metadata', {}).get('source')}")

Next Steps