Readers Module
The Readers module provides concrete implementations for loading documents from various sources into the knowledge base.Import
Quick Example
Features
- Automatic source type detection and routing
- Multiple reader implementations (Text, MarkItDown, Directory, URL, Glob)
- Lazy loading of optional dependencies
- Metadata preservation for loaded documents
- Recursive directory traversal with exclusion patterns
Classes
AutoReader
Automatic reader that detects source type and routes to the appropriate reader.
TextReader
Simple text file reader for plain text files.
.txt, .text, .log
MarkItDownReader
Document reader using markitdown for rich document conversion.
.pdf, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .html, .htm, .md, .markdown, .csv, .json, .xml, .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .mp3, .wav, .ogg, .m4a, .flac
Requires
markitdown package: pip install markitdownDirectoryReader
Recursively reads all files in a directory.
| Parameter | Type | Default | Description |
|---|---|---|---|
recursive | bool | True | Recursively traverse subdirectories |
exclude_patterns | List[str] | See below | Glob patterns to exclude |
*.pyc, __pycache__, .git, .svn, node_modules, *.egg-info, .env, .venv, venv
Methods
load(source, metadata=None)
Load documents from a source.
Parameters:
source(str): File path, directory, URL, or glob patternmetadata(dict, optional): Additional metadata to attach to documents
List[Document] - List of loaded documents
can_handle(source)
Check if the reader can handle the given source.
Parameters:
source(str): Source to check
bool - True if the reader can handle this source
Example: Custom Metadata
Example: URL Reading
CLI Usage
Related
- Vector Store Module - Store loaded documents
- Retrieval Module - Retrieve documents

