Quick Start
Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[str] | [] | Files, directories, or URLs |
embedder | str | "openai" | Embedding provider |
embedder_config | Dict | None | None | Embedder-specific settings |
chunking_strategy | str | ChunkingStrategy | "semantic" | Chunking method (fixed, semantic, sentence, paragraph) |
chunk_size | int | 1000 | Target chunk size in tokens |
chunk_overlap | int | 200 | Overlap between chunks |
chunker | Dict | None | None | Alternative chunker config dict |
retrieval_k | int | 5 | Number of chunks to retrieve |
retrieval_threshold | float | 0.0 | Minimum similarity threshold |
rerank | bool | False | Enable result reranking |
rerank_model | str | None | None | Model for reranking |
auto_retrieve | bool | True | Auto-inject relevant context |
vector_store | Dict | None | None | Vector database settings |
config | Dict | None | None | Advanced configuration dict |
Common Patterns
Pattern 1: High-Precision Retrieval
Pattern 2: Custom Chunking
Pattern 3: External Vector Store
Best Practices
Use Semantic Chunking for Documents
Use Semantic Chunking for Documents
Semantic chunking preserves context better than fixed-size chunks, especially for technical documentation.
Enable Reranking for Accuracy
Enable Reranking for Accuracy
Reranking improves retrieval precision by using a cross-encoder to reorder results.
Set Retrieval Threshold
Set Retrieval Threshold
Use
retrieval_threshold to filter low-quality matches and prevent hallucination.Auto-Retrieve by Default
Auto-Retrieve by Default
Keep
auto_retrieve=True to automatically inject relevant context into prompts.
