RAG Overview
Retrieval Augmented Generation (RAG) combines the power of large language models with your own data. Instead of relying solely on the model’s training data, RAG retrieves relevant information from your documents and uses it to generate accurate, grounded answers.How It Works
- Index: Your documents are chunked and stored as embeddings
- Retrieve: When you ask a question, relevant chunks are found
- Generate: The LLM uses retrieved context to answer
- Cite: Sources are tracked for transparency
Architecture
PraisonAI’s RAG is built on a simple principle:Knowledge indexes; RAG answers with citations.
- Knowledge: Handles document ingestion, chunking, embedding, and retrieval
- RAG: Thin orchestrator that combines Knowledge retrieval with LLM generation
Quick Example
When to Use RAG
| Use Case | RAG Helps? |
|---|---|
| Q&A over documents | ✅ Yes |
| Summarizing reports | ✅ Yes |
| Code documentation lookup | ✅ Yes |
| General knowledge questions | ❌ No (use base LLM) |
| Real-time data | ❌ No (use tools/APIs) |
Key Features
Citations
Every answer includes source references
Streaming
Real-time response streaming
Multi-Agent
Share knowledge across agents
CLI Support
Full CLI for indexing and querying

