Skip to main content
Splits text at sentence boundaries while respecting token limits. Preserves natural reading flow.

Quick Start

from praisonaiagents import Agent

agent = Agent(
    instructions="Answer questions from documents.",
    knowledge={
        "sources": ["article.pdf"],
        "chunker": {
            "type": "sentence",
            "chunk_size": 512,
            "chunk_overlap": 64
        }
    }
)

response = agent.start("What are the main arguments?")

When to Use

Good For

  • Articles and blog posts
  • Natural language content
  • Readability matters
  • Question-answering tasks

Consider Alternatives

  • Code or technical docs
  • Very long sentences
  • Structured data
  • Markdown with headers

Parameters

ParameterTypeDefaultDescription
chunk_sizeint512Max tokens per chunk
chunk_overlapint128Token overlap between chunks
tokenizer_or_token_counterstr"gpt2"Tokenizer for counting

Examples

News Articles

agent = Agent(
    instructions="Summarize news articles.",
    knowledge={
        "sources": ["news/"],
        "chunker": {
            "type": "sentence",
            "chunk_size": 256  # Short chunks for news
        }
    }
)

Long-form Content

agent = Agent(
    instructions="Analyze essays and papers.",
    knowledge={
        "sources": ["essays/"],
        "chunker": {
            "type": "sentence",
            "chunk_size": 1024,
            "chunk_overlap": 128
        }
    }
)

How It Works

Sentences are grouped together until the token limit is reached, then a new chunk starts.