Skip to main content
PraisonAI integrates chonkie for high-performance document chunking.

Quick Start

from praisonaiagents import Agent

# Default chunking (token-based)
agent = Agent(
    instructions="Answer questions from documents.",
    knowledge=["research.pdf", "docs/"]
)

response = agent.start("What are the key findings?")

Available Strategies

StrategyBest ForSpeed
tokenFixed-size chunks⚡ Fastest
sentenceNatural boundaries⚡ Fast
recursiveStructured docs (markdown)⚡ Fast
semanticTopic segmentation🔄 Medium
sdpmResearch papers🔄 Medium
lateBest embeddings🔄 Medium

Chunker Configuration

All Parameters

ParameterTypeDefaultApplies To
typestr"token"All
chunk_sizeint512All
chunk_overlapint128token, sentence
tokenizer_or_token_counterstr"gpt2"token, sentence, recursive
embedding_modelstrautosemantic, sdpm, late

Strategy Examples

agent = Agent(
    instructions="Process documents.",
    knowledge={
        "sources": ["docs/"],
        "chunker": {
            "type": "token",
            "chunk_size": 256,
            "chunk_overlap": 50
        }
    }
)

Choosing a Strategy

Installation

pip install "praisonaiagents[knowledge]"
This installs the chonkie library automatically.