Skip to main content
PraisonAI integrates chonkie, a high-performance chunking library, to provide flexible document processing strategies.

Quick Start

from praisonaiagents import Agent

# Agent with semantic chunking
agent = Agent(
    instructions="Answer questions from documents.",
    knowledge={
        "sources": ["research.pdf"],
        "chunker": {
            "type": "semantic",
            "chunk_size": 512
        }
    }
)

response = agent.start("What are the key findings?")

Available Strategies

StrategyAliasBest ForSpeed
TokentokenFixed-size chunks⚡ Fast
SentencesentenceNatural boundaries⚡ Fast
RecursiverecursiveStructured documents⚡ Fast
SemanticsemanticTopic segmentation🔄 Medium
SDPMsdpmResearch papers🔄 Medium
LatelateBetter embeddings🔄 Medium

Choosing a Strategy

Agent Configuration

Simplest (Default Strategy)

from praisonaiagents import Agent

# Uses token chunking by default
agent = Agent(
    instructions="Answer from documents.",
    knowledge=["docs/"]  # Default chunking
)

With Chunking Config

from praisonaiagents import Agent

agent = Agent(
    instructions="Answer from documents.",
    knowledge={
        "sources": ["research.pdf", "data/"],
        "chunker": {
            "type": "semantic",       # Strategy type
            "chunk_size": 512,        # Tokens per chunk
            "chunk_overlap": 128,     # Overlap between chunks
            "embedding_model": "all-MiniLM-L6-v2"  # For semantic/sdpm/late
        }
    }
)

All Chunker Options

OptionTypeDefaultDescription
typestr"token"Chunker type: token, sentence, recursive, semantic, sdpm, late
chunk_sizeint512Target tokens per chunk
chunk_overlapint128Overlap between chunks
tokenizer_or_token_counterstr"gpt2"Tokenizer for counting
embedding_modelstrautoEmbedding model (semantic/sdpm/late only)

Strategy Details

Installation

Chunking requires the knowledge extra:
pip install "praisonaiagents[knowledge]"
This installs the chonkie library automatically.