Skip to main content
Splits text hierarchically using multiple separators (paragraphs → sentences → words). Ideal for structured documents like markdown.

Quick Start

from praisonaiagents import Agent

agent = Agent(
    instructions="Answer questions from documentation.",
    knowledge={
        "sources": ["docs/"],
        "chunker": {
            "type": "recursive",
            "chunk_size": 512
        }
    }
)

response = agent.start("How do I configure the settings?")

When to Use

Good For

  • Markdown documentation
  • Technical manuals
  • Structured content
  • Code with comments

Consider Alternatives

  • Unstructured prose
  • Stream of consciousness
  • Very short documents
  • Topic-based splitting needed

Parameters

ParameterTypeDefaultDescription
chunk_sizeint512Max tokens per chunk
tokenizer_or_token_counterstr"gpt2"Tokenizer for counting

Examples

Documentation

agent = Agent(
    instructions="Help users find answers in docs.",
    knowledge={
        "sources": ["README.md", "docs/"],
        "chunker": {
            "type": "recursive",
            "chunk_size": 512
        }
    }
)

Large Codebase

agent = Agent(
    instructions="Explain code and architecture.",
    knowledge={
        "sources": ["src/"],
        "chunker": {
            "type": "recursive",
            "chunk_size": 1024
        }
    }
)

How It Works

The recursive approach tries larger separators first (paragraphs), then falls back to smaller ones (sentences, words) only when needed.

Best Practices

  1. Match chunk size to content density - Dense technical docs need smaller chunks
  2. Use with markdown - Recursive chunking respects markdown structure well
  3. Combine with semantic search - The hierarchical splits provide logical boundaries for retrieval