Skip to main content
The simplest and fastest chunking strategy. Splits text into fixed-size token chunks with optional overlap.

Quick Start

from praisonaiagents import Agent

agent = Agent(
    instructions="Answer questions from documents.",
    knowledge={
        "sources": ["document.pdf"],
        "chunker": {
            "type": "token",
            "chunk_size": 256,
            "chunk_overlap": 50
        }
    }
)

response = agent.start("Summarize the main points")

When to Use

Good For

  • Speed-critical applications
  • Uniform chunk sizes needed
  • Simple documents without structure
  • High-volume processing

Consider Alternatives

  • Documents with natural sections
  • Topic-dependent content
  • Need semantic coherence
  • Complex nested structures

Parameters

ParameterTypeDefaultDescription
chunk_sizeint512Number of tokens per chunk
chunk_overlapint128Token overlap between chunks
tokenizerstr"gpt2"Tokenizer for counting tokens

Examples

Small Chunks for Precision

agent = Agent(
    instructions="Find specific details.",
    knowledge={
        "sources": ["technical_spec.pdf"],
        "chunker": {
            "type": "token",
            "chunk_size": 128,   # Small for precision
            "chunk_overlap": 32
        }
    }
)

Large Chunks for Context

agent = Agent(
    instructions="Understand overall themes.",
    knowledge={
        "sources": ["novel.txt"],
        "chunker": {
            "type": "token",
            "chunk_size": 1024,  # Large for context
            "chunk_overlap": 256
        }
    }
)

How It Works

Token chunking is deterministic - the same document always produces the same chunks.