Readers Module

The Readers module provides concrete implementations for loading documents from various sources into the knowledge base.

Import

from praisonai.adapters import AutoReader, TextReader, MarkItDownReader, DirectoryReader

Quick Example

from praisonai.adapters import AutoReader

# AutoReader automatically detects source type
reader = AutoReader()

# Load from file
docs = reader.load("document.pdf")

# Load from directory
docs = reader.load("./docs/")

# Load from URL
docs = reader.load("https://example.com/page.html")

Features

Automatic source type detection and routing
Multiple reader implementations (Text, MarkItDown, Directory, URL, Glob)
Lazy loading of optional dependencies
Metadata preservation for loaded documents
Recursive directory traversal with exclusion patterns

Classes

`AutoReader`

Automatic reader that detects source type and routes to the appropriate reader.

from praisonai.adapters import AutoReader

reader = AutoReader()

# Handles files, directories, URLs, and glob patterns
docs = reader.load("report.pdf")
docs = reader.load("./documents/")
docs = reader.load("https://example.com")
docs = reader.load("*.md")

`TextReader`

Simple text file reader for plain text files.

from praisonai.adapters import TextReader

reader = TextReader()
docs = reader.load("notes.txt")

Supported Extensions: .txt, .text, .log

`MarkItDownReader`

Document reader using markitdown for rich document conversion.

from praisonai.adapters import MarkItDownReader

reader = MarkItDownReader()
docs = reader.load("report.pdf")

Supported Extensions: .pdf, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .html, .htm, .md, .markdown, .csv, .json, .xml, .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .mp3, .wav, .ogg, .m4a, .flac

Requires markitdown package: pip install markitdown

`DirectoryReader`

Recursively reads all files in a directory.

from praisonai.adapters import DirectoryReader

reader = DirectoryReader(
    recursive=True,
    exclude_patterns=["*.pyc", "__pycache__", ".git", "node_modules"]
)
docs = reader.load("./project/")

Parameters:

Parameter	Type	Default	Description
`recursive`	`bool`	`True`	Recursively traverse subdirectories
`exclude_patterns`	`List[str]`	See below	Glob patterns to exclude

Default Exclusions: *.pyc, __pycache__, .git, .svn, node_modules, *.egg-info, .env, .venv, venv

Methods

`load(source, metadata=None)`

Load documents from a source. Parameters:

source (str): File path, directory, URL, or glob pattern
metadata (dict, optional): Additional metadata to attach to documents

Returns: List[Document] - List of loaded documents

`can_handle(source)`

Check if the reader can handle the given source. Parameters:

source (str): Source to check

Returns: bool - True if the reader can handle this source

Example: Custom Metadata

from praisonai.adapters import AutoReader

reader = AutoReader()

# Add custom metadata to loaded documents
docs = reader.load(
    "technical_docs/",
    metadata={
        "category": "technical",
        "version": "2.0",
        "author": "engineering"
    }
)

for doc in docs:
    print(f"Source: {doc.metadata['source']}")
    print(f"Category: {doc.metadata['category']}")

Example: URL Reading

from praisonai.adapters.readers import URLReader

reader = URLReader()
docs = reader.load("https://docs.python.org/3/tutorial/index.html")

# Content is automatically extracted from HTML
print(docs[0].content[:500])

CLI Usage

praisonai knowledge add <source>

Examples:

# Add a single file
praisonai knowledge add document.pdf

# Add all files in a directory
praisonai knowledge add ./docs/

# Add files matching a pattern
praisonai knowledge add "*.pdf"

# Add from URL
praisonai knowledge add https://example.com/page.html

Vector Store Module - Store loaded documents
Retrieval Module - Retrieve documents

Guide

Reference

Readers Module

Readers Module

Import

Quick Example

Features

Classes

`AutoReader`

`TextReader`

`MarkItDownReader`

`DirectoryReader`

Methods

`load(source, metadata=None)`

`can_handle(source)`

Example: Custom Metadata

Example: URL Reading

CLI Usage

Guide

Reference

​Readers Module

​Import

​Quick Example

​Features

​Classes

​AutoReader

​TextReader

​MarkItDownReader

​DirectoryReader

​Methods

​load(source, metadata=None)

​can_handle(source)

​Example: Custom Metadata

​Example: URL Reading

​CLI Usage

​Related

Readers Module

Import

Quick Example

Features

Classes

`AutoReader`

`TextReader`

`MarkItDownReader`

`DirectoryReader`

Methods

`load(source, metadata=None)`

`can_handle(source)`

Example: Custom Metadata

Example: URL Reading

CLI Usage

Related