Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
The simplest and fastest chunking strategy. Splits text into fixed-size token chunks with optional overlap.
Quick Start
from praisonaiagents import Agent
agent = Agent(
instructions="Answer questions from documents.",
knowledge={
"sources": ["document.pdf"],
"chunker": {
"type": "token",
"chunk_size": 256,
"chunk_overlap": 50
}
}
)
response = agent.start("Summarize the main points")
When to Use
Good For
- Speed-critical applications
- Uniform chunk sizes needed
- Simple documents without structure
- High-volume processing
Consider Alternatives
- Documents with natural sections
- Topic-dependent content
- Need semantic coherence
- Complex nested structures
Parameters
| Parameter | Type | Default | Description |
|---|
chunk_size | int | 512 | Number of tokens per chunk |
chunk_overlap | int | 128 | Token overlap between chunks |
tokenizer | str | "gpt2" | Tokenizer for counting tokens |
Examples
Small Chunks for Precision
agent = Agent(
instructions="Find specific details.",
knowledge={
"sources": ["technical_spec.pdf"],
"chunker": {
"type": "token",
"chunk_size": 128, # Small for precision
"chunk_overlap": 32
}
}
)
Large Chunks for Context
agent = Agent(
instructions="Understand overall themes.",
knowledge={
"sources": ["novel.txt"],
"chunker": {
"type": "token",
"chunk_size": 1024, # Large for context
"chunk_overlap": 256
}
}
)
How It Works
Token chunking is deterministic - the same document always produces the same chunks.