Data Readers Module
The readers module provides a protocol-based system for loading documents from various sources into the knowledge base.Quick Start
Classes
Document
A lightweight dataclass representing a loaded document.text- The document content as plain textmetadata- Optional metadata dictionary (source, title, etc.)doc_id- Optional unique identifier
ReaderProtocol
Protocol defining the interface for document readers.ReaderRegistry
Registry for managing and discovering readers.Utility Functions
detect_source_kind
Detect the type of source without importing heavy libraries.get_file_extension
Extract file extension from a path or URL.Supported File Types
| Extension | Reader | Description |
|---|---|---|
| txt, text | TextReader | Plain text files |
| md, markdown | MarkdownReader | Markdown documents |
| json, jsonl | JSONReader | JSON and JSON Lines |
| csv, tsv | CSVReader | Tabular data |
| html, htm | HTMLReader | Web pages |
| PDFReader | PDF documents | |
| docx, doc | DocxReader | Word documents |
| xlsx, xls | ExcelReader | Spreadsheets |
| pptx, ppt | PowerPointReader | Presentations |
Creating Custom Readers
Performance
- Zero heavy imports at module level
- Readers are lazy-loaded when first accessed
- No chromadb, torch, or sentence_transformers dependencies

