Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Incremental Indexing
Incremental indexing tracks file changes using hashes and modification times, ensuring only modified files are re-indexed across sessions.
Overview
The incremental indexing system provides:
- File hash tracking for accurate change detection
- Modification time monitoring as a fast-path check
- Persistent state across indexing sessions
- Ignore patterns via
.praisonignore files
Quick Start
from praisonaiagents.knowledge import Knowledge
knowledge = Knowledge()
# First index - indexes all files
result1 = knowledge.index(
"./docs",
memory={"user_id": "my_user"},
incremental=True,
)
print(f"Indexed: {result1.files_indexed}, Skipped: {result1.files_skipped}")
# Second index - skips unchanged files
result2 = knowledge.index(
"./docs",
memory={"user_id": "my_user"},
incremental=True,
)
print(f"Indexed: {result2.files_indexed}, Skipped: {result2.files_skipped}")
How It Works
File Tracking
The FileTracker class maintains state about indexed files:
from praisonaiagents.knowledge.indexing import FileTracker
tracker = FileTracker(state_file=".praison/.index_state.json")
tracker.load()
# Check if file has changed
if tracker.has_changed("./docs/readme.md"):
print("File needs re-indexing")
else:
print("File unchanged, skipping")
State Persistence
State is automatically saved to .praison/.index_state.json:
{
"/path/to/file1.txt": {
"path": "/path/to/file1.txt",
"hash": "abc123...",
"mtime": 1704067200.0,
"size": 1024
}
}
Ignore Patterns
Using .praisonignore
Create a .praisonignore file in your corpus directory:
# Ignore log files
*.log
# Ignore test directories
test/
tests/
__pycache__/
# Ignore specific files
secrets.txt
.env
Programmatic Exclusion
result = knowledge.index(
"./docs",
memory={"user_id": "my_user"},
exclude_glob=["*.log", "test_*", "*.tmp"],
include_glob=["*.md", "*.txt", "*.py"],
)
CLI Usage
# Incremental index (default)
praisonai knowledge index ./docs --user-id myuser
# Force full re-index
praisonai knowledge index ./docs --user-id myuser --full
# With include/exclude patterns
praisonai knowledge index ./docs -i "*.md,*.txt" -e "*.log,test_*"
# Verbose output showing skipped files
praisonai knowledge index ./docs --verbose
Index Results
The IndexResult dataclass provides detailed statistics:
from praisonaiagents.knowledge.indexing import IndexResult
result = knowledge.index("./docs", memory={"user_id": "my_user"})
print(f"Files indexed: {result.files_indexed}")
print(f"Files skipped: {result.files_skipped}")
print(f"Chunks created: {result.chunks_created}")
print(f"Duration: {result.duration_seconds:.2f}s")
print(f"Errors: {result.errors}")
Corpus Statistics
Get statistics about your indexed corpus:
from praisonaiagents.knowledge.indexing import CorpusStats
# From directory scan
stats = CorpusStats.from_directory("./docs")
print(f"Files: {stats.file_count}")
print(f"Estimated tokens: {stats.total_tokens}")
print(f"Recommended strategy: {stats.strategy_recommendation}")
Integration with Agents
from praisonaiagents import Agent
# Agent with knowledge automatically uses incremental indexing
agent = Agent(
name="DocExpert",
instructions="Answer questions using the knowledge base.",
knowledge=["./docs"],
memory={"user_id": "my_user"} , # Enables user-scoped indexing
)
# First chat triggers indexing
response = agent.chat("What are the key features?")
# Subsequent chats use cached index (incremental)
response = agent.chat("Tell me more about authentication")
Best Practices
- Use incremental mode - Default behavior, saves time on large corpora
- Set up ignore patterns - Exclude logs, tests, and temporary files
- Monitor index results - Check for errors and unexpected skips
- Periodic full re-index - Use
--full occasionally to ensure consistency
API Reference
IndexResult
@dataclass
class IndexResult:
files_indexed: int = 0
files_skipped: int = 0
chunks_created: int = 0
duration_seconds: float = 0.0
errors: List[str] = field(default_factory=list)
corpus_stats: Optional[CorpusStats] = None
FileTracker
class FileTracker:
def __init__(self, state_file: Optional[str] = None):
"""Initialize tracker with optional state file."""
def has_changed(self, filepath: str) -> bool:
"""Check if file has changed since last index."""
def mark_indexed(self, filepath: str, info: Dict) -> None:
"""Mark file as indexed."""
def save(self) -> None:
"""Save state to file."""
def load(self) -> None:
"""Load state from file."""
IgnoreMatcher
class IgnoreMatcher:
def __init__(self, patterns: List[str] = None):
"""Initialize with ignore patterns."""
def should_ignore(self, path: str) -> bool:
"""Check if path should be ignored."""
@classmethod
def from_directory(cls, directory: str) -> "IgnoreMatcher":
"""Load patterns from .praisonignore and .gitignore."""