Skip to main content
Late chunking embeds the entire document first, then splits. This produces chunks with better individual embeddings.

Quick Start

from praisonaiagents import Agent

agent = Agent(
    instructions="Answer questions with high precision.",
    knowledge={
        "sources": ["technical_docs/"],
        "chunker": {
            "type": "late",
            "chunk_size": 512,
            "embedding_model": "all-MiniLM-L6-v2"
        }
    }
)

response = agent.start("Explain the architecture")

When to Use

  • High-precision retrieval needed
  • Quality matters more than speed
  • Complex technical documents
  • Semantic similarity search critical

Parameters

ParameterTypeDefaultDescription
chunk_sizeint512Max tokens per chunk
embedding_modelstrautoEmbedding model

How It Works

Traditional chunking: Split → Embed each chunk Late chunking: Embed full doc → Split with context awareness This preserves document-level context in each chunk’s embedding.