Quick Start
Available Strategies
| Strategy | Best For | Speed |
|---|---|---|
token | Fixed-size chunks | ⚡ Fastest |
sentence | Natural boundaries | ⚡ Fast |
recursive | Structured docs (markdown) | ⚡ Fast |
semantic | Topic segmentation | 🔄 Medium |
sdpm | Research papers | 🔄 Medium |
late | Best embeddings | 🔄 Medium |
Token Chunking
Fixed-size token chunks. Fast and predictable.
Sentence Chunking
Split at sentence boundaries. Natural flow.
Recursive Chunking
Hierarchical splitting. Great for markdown.
Semantic Chunking
Similarity-based splits. Topic coherence.
Chunker Configuration
All Parameters
| Parameter | Type | Default | Applies To |
|---|---|---|---|
type | str | "token" | All |
chunk_size | int | 512 | All |
chunk_overlap | int | 128 | token, sentence |
tokenizer_or_token_counter | str | "gpt2" | token, sentence, recursive |
embedding_model | str | auto | semantic, sdpm, late |
Strategy Examples
- Token
- Sentence
- Recursive
- Semantic

