Data Readers Module
Quick Start
Classes
Document
ReaderProtocol
ReaderRegistry
Utility Functions
detect_source_kind
get_file_extension
Supported File Types
Creating Custom Readers
Performance

Data Readers Module

The readers module provides a protocol-based system for loading documents from various sources into the knowledge base.

Quick Start

from praisonaiagents.knowledge.readers import (
    Document,
    ReaderProtocol,
    get_reader_registry,
    detect_source_kind,
    get_file_extension
)

# Detect source type
kind = detect_source_kind("document.pdf")  # Returns "file"
kind = detect_source_kind("https://example.com")  # Returns "url"
kind = detect_source_kind("./docs/")  # Returns "directory"

# Get file extension
ext = get_file_extension("report.pdf")  # Returns "pdf"

Classes

Document

A lightweight dataclass representing a loaded document.

from dataclasses import dataclass
from typing import Any, Dict, Optional

@dataclass
class Document:
    text: str
    metadata: Dict[str, Any] = field(default_factory=dict)
    doc_id: Optional[str] = None

Fields:

text - The document content as plain text
metadata - Optional metadata dictionary (source, title, etc.)
doc_id - Optional unique identifier

ReaderProtocol

Protocol defining the interface for document readers.

from typing import Protocol, List

class ReaderProtocol(Protocol):
    name: str
    supported_extensions: List[str]
    
    def load(
        self,
        source: str,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs
    ) -> List[Document]:
        """Load documents from a source."""
        ...

ReaderRegistry

Registry for managing and discovering readers.

from praisonaiagents.knowledge.readers import get_reader_registry

registry = get_reader_registry()

# List registered readers
readers = registry.list_readers()  # ['text', 'markdown', 'json', ...]

# Get reader for extension
reader = registry.get_for_extension("pdf")

# Register custom reader
registry.register(MyCustomReader())

Utility Functions

detect_source_kind

Detect the type of source without importing heavy libraries.

from praisonaiagents.knowledge.readers import detect_source_kind

detect_source_kind("file.pdf")           # "file"
detect_source_kind("https://example.com") # "url"
detect_source_kind("./docs/")            # "directory"
detect_source_kind("*.pdf")              # "glob"

get_file_extension

Extract file extension from a path or URL.

from praisonaiagents.knowledge.readers import get_file_extension

get_file_extension("report.pdf")                    # "pdf"
get_file_extension("https://example.com/doc.html")  # "html"

Supported File Types

Extension	Reader	Description
txt, text	TextReader	Plain text files
md, markdown	MarkdownReader	Markdown documents
json, jsonl	JSONReader	JSON and JSON Lines
csv, tsv	CSVReader	Tabular data
html, htm	HTMLReader	Web pages
pdf	PDFReader	PDF documents
docx, doc	DocxReader	Word documents
xlsx, xls	ExcelReader	Spreadsheets
pptx, ppt	PowerPointReader	Presentations

Creating Custom Readers

from praisonaiagents.knowledge.readers import Document, get_reader_registry
from typing import List, Dict, Any, Optional

class MyCustomReader:
    name = "custom"
    supported_extensions = ["custom", "cst"]
    
    def load(
        self,
        source: str,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs
    ) -> List[Document]:
        with open(source, 'r') as f:
            content = f.read()
        
        return [Document(
            text=content,
            metadata={"source": source, **(metadata or {})}
        )]

# Register the reader
registry = get_reader_registry()
registry.register(MyCustomReader())

Performance

Zero heavy imports at module level
Readers are lazy-loaded when first accessed
No chromadb, torch, or sentence_transformers dependencies

Query Engine Module Rerankers Module

⌘I

Getting Started

Learn

SDK Reference

API Reference

Guides

Features

Persistence

Databases

Models

Observability

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Data Readers Module

Data Readers Module

Quick Start

Classes

Document

ReaderProtocol

ReaderRegistry

Utility Functions

detect_source_kind

get_file_extension

Supported File Types

Creating Custom Readers

Performance

Getting Started

Learn

SDK Reference

API Reference

Guides

Features

Persistence

Databases

Models

Observability

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Data Readers Module

​Quick Start

​Classes

​Document

​ReaderProtocol

​ReaderRegistry

​Utility Functions

​detect_source_kind

​get_file_extension

​Supported File Types

​Creating Custom Readers

​Performance

Data Readers Module

Quick Start

Classes

Document

ReaderProtocol

ReaderRegistry

Utility Functions

detect_source_kind

get_file_extension

Supported File Types

Creating Custom Readers

Performance