Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Smart Retrieval
Smart retrieval combines multiple search techniques for optimal relevance: keyword prefiltering, semantic search, and reranking.
Overview
The SmartRetriever provides:
- Hybrid search combining keyword (BM25) and semantic search
- Keyword prefiltering for efficient candidate selection
- Semantic reranking for improved relevance
- Score normalization across different search methods
Quick Start
from praisonaiagents.rag import SmartRetriever, RetrievalResult
retriever = SmartRetriever()
# Basic search
results = retriever.search(
query="What is the API authentication method?",
top_k=5,
)
for result in results:
print(f"Score: {result.score:.3f} - {result.text[:100]}...")
Hybrid Search
Combining Keyword and Semantic Search
from praisonaiagents.rag import SmartRetriever
retriever = SmartRetriever(
use_hybrid=True,
keyword_weight=0.3,
semantic_weight=0.7,
)
results = retriever.search(
query="authentication API key",
top_k=10,
)
How Hybrid Search Works
- Keyword Search (BM25) - Fast lexical matching
- Semantic Search - Embedding-based similarity
- Score Fusion - Weighted combination of scores
- Deduplication - Remove duplicate results
Reranking
Using the Reranker
from praisonaiagents.rag import SmartRetriever, RetrievalResult
retriever = SmartRetriever()
# Initial search with more candidates
initial_results = retriever.search(query, top_k=20)
# Rerank to get best results
reranked = retriever.rerank(
query=query,
results=initial_results,
top_k=5,
)
Custom Reranker
from praisonaiagents.rag import RerankerProtocol, RetrievalResult
from typing import List
class CustomReranker(RerankerProtocol):
def rerank(
self,
query: str,
results: List[RetrievalResult],
top_k: int = 5,
) -> List[RetrievalResult]:
# Custom reranking logic
scored = []
for result in results:
# Calculate custom relevance score
score = self._calculate_relevance(query, result.text)
scored.append((score, result))
# Sort by score and return top_k
scored.sort(key=lambda x: x[0], reverse=True)
return [r for _, r in scored[:top_k]]
def _calculate_relevance(self, query: str, text: str) -> float:
# Implement custom scoring
overlap = len(set(query.lower().split()) & set(text.lower().split()))
return overlap / max(len(query.split()), 1)
Retrieval Results
RetrievalResult Structure
from dataclasses import dataclass
from typing import Dict, Any
@dataclass
class RetrievalResult:
text: str
score: float
metadata: Dict[str, Any] = None
source: str = None
chunk_id: str = None
Working with Results
results = retriever.search(query, top_k=5)
for result in results:
print(f"Score: {result.score:.3f}")
print(f"Text: {result.text[:200]}...")
print(f"Source: {result.metadata.get('source', 'unknown')}")
print("---")
CLI Usage
# Basic search
praisonai knowledge search "API authentication"
# Search with reranking
praisonai knowledge search "API authentication" --rerank
# Search with specific top_k
praisonai knowledge search "API authentication" --top-k 10
# Verbose output
praisonai knowledge search "API authentication" --rerank --verbose
Integration with Agents
from praisonaiagents import Agent
agent = Agent(
name="SmartSearchAgent",
instructions="Answer questions using the knowledge base.",
knowledge={
"sources": ["./docs"],
"retrieval_k": 10,
"rerank": True,
"retrieval_threshold": 0.5,
}
)
response = agent.chat("How do I authenticate with the API?")
Best Practices
- Use hybrid search for diverse corpora with technical terms
- Enable reranking for improved relevance at slight latency cost
- Adjust weights based on your content type
- Set min_score to filter low-quality results
API Reference
SmartRetriever
class SmartRetriever:
def __init__(
self,
use_hybrid: bool = False,
keyword_weight: float = 0.3,
semantic_weight: float = 0.7,
use_reranking: bool = False,
):
"""Initialize smart retriever."""
def search(
self,
query: str,
top_k: int = 5,
min_score: float = 0.0,
) -> List[RetrievalResult]:
"""Search for relevant documents."""
def rerank(
self,
query: str,
results: List[RetrievalResult],
top_k: int = 5,
) -> List[RetrievalResult]:
"""Rerank results for improved relevance."""
RetrieverProtocol
class RetrieverProtocol(Protocol):
def search(
self,
query: str,
top_k: int = 5,
) -> List[RetrievalResult]:
"""Search for relevant documents."""
RerankerProtocol
class RerankerProtocol(Protocol):
def rerank(
self,
query: str,
results: List[RetrievalResult],
top_k: int = 5,
) -> List[RetrievalResult]:
"""Rerank results."""