Quick Start
When to Use
Good For
- Markdown documentation
- Technical manuals
- Structured content
- Code with comments
Consider Alternatives
- Unstructured prose
- Stream of consciousness
- Very short documents
- Topic-based splitting needed
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
chunk_size | int | 512 | Max tokens per chunk |
tokenizer_or_token_counter | str | "gpt2" | Tokenizer for counting |
Examples
Documentation
Large Codebase
How It Works
The recursive approach tries larger separators first (paragraphs), then falls back to smaller ones (sentences, words) only when needed.Best Practices
- Match chunk size to content density - Dense technical docs need smaller chunks
- Use with markdown - Recursive chunking respects markdown structure well
- Combine with semantic search - The hierarchical splits provide logical boundaries for retrieval

