Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praison.ai/llms.txt

Use this file to discover all available pages before exploring further.

Retrieval Strategies

The retrieval strategy system automatically selects the optimal retrieval approach based on corpus size, ensuring efficient and accurate knowledge retrieval.

Overview

Available strategies:
  • DIRECT - Load all content directly (small corpora)
  • BASIC - Semantic search only
  • HYBRID - Keyword + semantic search
  • RERANKED - Hybrid + reranking
  • COMPRESSED - Reranked + compression
  • HIERARCHICAL - Multi-level summaries for very large corpora

Quick Start

from praisonaiagents.rag import select_strategy, RetrievalStrategy

# Auto-select based on corpus size
strategy = select_strategy(corpus_tokens=5000)
print(f"Selected strategy: {strategy.value}")

# Manual strategy selection
strategy = RetrievalStrategy.HYBRID

Strategy Selection

Automatic Selection

The system automatically selects strategies based on corpus size:
Corpus SizeStrategyDescription
< 500 tokensDIRECTLoad all content
< 5,000 tokensBASICSemantic search
< 20,000 tokensHYBRIDKeyword + semantic
< 50,000 tokensRERANKEDHybrid + reranking
< 100,000 tokensCOMPRESSEDWith compression
> 100,000 tokensHIERARCHICALMulti-level summaries
from praisonaiagents.rag import select_strategy

# Examples
select_strategy(corpus_tokens=100)    # -> DIRECT
select_strategy(corpus_tokens=3000)   # -> BASIC
select_strategy(corpus_tokens=10000)  # -> HYBRID
select_strategy(corpus_tokens=30000)  # -> RERANKED
select_strategy(corpus_tokens=75000)  # -> COMPRESSED
select_strategy(corpus_tokens=200000) # -> HIERARCHICAL

Manual Override

from praisonaiagents.rag import RetrievalConfig, RetrievalStrategy

config = RetrievalConfig(
    strategy=RetrievalStrategy.RERANKED,
    top_k=10,
    min_score=0.5,
)

Strategy Details

DIRECT Strategy

Best for very small corpora where all content fits in context:
# All content loaded directly
# No retrieval overhead
# Maximum context utilization

BASIC Strategy

Semantic search using embeddings:
from praisonaiagents.rag import SmartRetriever

retriever = SmartRetriever()
results = retriever.search(
    query="What is the API key?",
    top_k=5,
)

HYBRID Strategy

Combines keyword (BM25) and semantic search:
from praisonaiagents.rag import SmartRetriever

retriever = SmartRetriever(
    use_hybrid=True,
    keyword_weight=0.3,
    semantic_weight=0.7,
)
results = retriever.search(query, top_k=10)

RERANKED Strategy

Adds reranking for improved relevance:
from praisonaiagents.rag import SmartRetriever

retriever = SmartRetriever(use_reranking=True)
results = retriever.search(query, top_k=20)
reranked = retriever.rerank(query, results, top_k=5)

COMPRESSED Strategy

Includes context compression:
from praisonaiagents.rag import ContextCompressor

compressor = ContextCompressor(
    max_tokens=4000,
    target_ratio=0.5,
)
compressed = compressor.compress(chunks, query=query)

HIERARCHICAL Strategy

Uses multi-level summaries:
from praisonaiagents.rag import HierarchicalSummarizer

summarizer = HierarchicalSummarizer(max_levels=3)
result = summarizer.build_hierarchy(files, base_path="./docs")
answer = summarizer.query("What are the main features?")

CLI Usage

# Search with specific strategy
praisonai knowledge search "query" --strategy hybrid

# Search with reranking
praisonai knowledge search "query" --rerank

# Search with compression
praisonai knowledge search "query" --compress --max-context-tokens 4000

# Combined options
praisonai knowledge search "query" --strategy reranked --rerank --compress

Integration with Agents

from praisonaiagents import Agent

agent = Agent(
    name="SmartRetriever",
    instructions="Answer questions using the knowledge base.",
    knowledge={
        "sources": ["./docs"],
        "retrieval_k": 10,
        "rerank": True,
    }
)

response = agent.chat("What are the key features?")

Best Practices

  1. Start with auto-selection - Let the system choose based on corpus size
  2. Monitor performance - Use profiling to identify bottlenecks
  3. Adjust for quality - Increase top_k and enable reranking for better results
  4. Consider latency - More complex strategies add latency

API Reference

RetrievalStrategy Enum

class RetrievalStrategy(str, Enum):
    DIRECT = "direct"
    BASIC = "basic"
    HYBRID = "hybrid"
    RERANKED = "reranked"
    COMPRESSED = "compressed"
    HIERARCHICAL = "hierarchical"

select_strategy Function

def select_strategy(
    corpus_tokens: int = 0,
    corpus_files: int = 0,
    query_complexity: str = "medium",
) -> RetrievalStrategy:
    """Select optimal strategy based on corpus characteristics."""