Skip to main content

Retrieval Strategies Module

The retrieval module provides various strategies for finding relevant documents from the knowledge base.

Quick Start

from praisonaiagents.knowledge.retrieval import (
    RetrievalStrategy,
    RetrievalResult,
    RetrieverProtocol,
    get_retriever_registry,
    reciprocal_rank_fusion,
    merge_adjacent_chunks
)

# Use built-in RRF fusion
results_list = [
    [{"id": "1", "score": 0.9}, {"id": "2", "score": 0.8}],
    [{"id": "2", "score": 0.95}, {"id": "3", "score": 0.7}]
]
fused = reciprocal_rank_fusion(results_list, k=60)

Retrieval Strategies

RetrievalStrategy Enum

from praisonaiagents.knowledge.retrieval import RetrievalStrategy

class RetrievalStrategy(Enum):
    BASIC = "basic"           # Simple vector similarity
    FUSION = "fusion"         # Multi-query with RRF
    RECURSIVE = "recursive"   # Depth-limited recursive
    AUTO_MERGE = "auto_merge" # Parent-child merging

Strategy Descriptions

StrategyDescriptionUse Case
basicSimple vector similarity searchGeneral queries
fusionMultiple queries + Reciprocal Rank FusionComplex queries
recursiveFollows references between chunksHierarchical docs
auto_mergeMerges child chunks into parentsLong documents

Classes

RetrievalResult

Dataclass for retrieval results.
@dataclass
class RetrievalResult:
    text: str
    score: float
    metadata: Dict[str, Any] = field(default_factory=dict)
    doc_id: Optional[str] = None

RetrieverProtocol

Protocol for retriever implementations.
class RetrieverProtocol(Protocol):
    name: str
    strategy: RetrievalStrategy
    
    def retrieve(
        self,
        query: str,
        top_k: int = 10,
        **kwargs
    ) -> List[RetrievalResult]:
        """Retrieve relevant documents."""
        ...

Utility Functions

reciprocal_rank_fusion

Combine results from multiple retrievers using RRF.
from praisonaiagents.knowledge.retrieval import reciprocal_rank_fusion

# Results from multiple queries/retrievers
results_a = [{"id": "1", "score": 0.9}, {"id": "2", "score": 0.8}]
results_b = [{"id": "2", "score": 0.95}, {"id": "3", "score": 0.7}]

# Fuse with RRF (k=60 is standard)
fused = reciprocal_rank_fusion([results_a, results_b], k=60)
# Returns: [{"id": "2", "rrf_score": ...}, {"id": "1", ...}, {"id": "3", ...}]

merge_adjacent_chunks

Merge consecutive chunks from the same document.
from praisonaiagents.knowledge.retrieval import merge_adjacent_chunks

chunks = [
    {"text": "Part 1", "doc_id": "doc1", "chunk_idx": 0},
    {"text": "Part 2", "doc_id": "doc1", "chunk_idx": 1},
    {"text": "Other", "doc_id": "doc2", "chunk_idx": 0}
]

merged = merge_adjacent_chunks(chunks)
# Merges adjacent chunks from same document

Using with Knowledge

from praisonaiagents import Agent, Knowledge

# Configure retrieval strategy
agent = Agent(
    instructions="You are a helpful assistant",
    knowledge=["./docs/"],
    knowledge_config={
        "retrieval_strategy": "fusion",  # Use fusion retrieval
        "top_k": 10
    }
)

response = agent.chat("What is the architecture?")

Creating Custom Retrievers

from praisonaiagents.knowledge.retrieval import (
    RetrievalStrategy,
    RetrievalResult,
    get_retriever_registry
)

class MyRetriever:
    name = "my_retriever"
    strategy = RetrievalStrategy.BASIC
    
    def __init__(self, vector_store, **config):
        self.store = vector_store
    
    def retrieve(
        self,
        query: str,
        top_k: int = 10,
        **kwargs
    ) -> List[RetrievalResult]:
        # Custom retrieval logic
        ...

# Register
registry = get_retriever_registry()
registry.register("my_retriever", MyRetriever)

Performance

  • All utility functions are pure Python (no external deps)
  • RRF fusion is O(n log n) where n is total results
  • Chunk merging is O(n) with single pass