Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
OCR and text extraction agent using vision models.
Simple
Agents: 1 — Single agent with vision capabilities extracts text from images.
Workflow
- Receive image with text
- Process with vision model
- Extract and return text content
Setup
pip install praisonaiagents praisonai
export OPENAI_API_KEY="your-key"
Run — Python
from praisonaiagents import Agent, Task, AgentTeam
agent = Agent(
name="OCRAgent",
instructions="Extract all text from images preserving layout.",
llm="gpt-4o-mini"
)
task = Task(
description="Extract all text from this document",
expected_output="Extracted text",
agent=agent,
images=["document.jpg"]
)
agents = AgentTeam(agents=[agent], tasks=[task])
result = agents.start()
print(result)
Run — CLI
praisonai "Extract text from this document" --image document.jpg
Run — agents.yaml
framework: praisonai
topic: Text Extraction
roles:
ocr_agent:
role: OCR Specialist
goal: Extract text from images
backstory: You are an expert in text extraction
llm: gpt-4o-mini
tasks:
extract:
description: Extract all text from this document
expected_output: Extracted text
images:
- document.jpg
Serve API
from praisonaiagents import Agent
agent = Agent(
name="OCRAgent",
instructions="You are an OCR expert.",
llm="gpt-4o-mini"
)
agent.launch(port=8080)
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{"message": "Extract text from: https://example.com/doc.jpg"}'
Advanced Workflow (All Features)
Agents: 1 — Single agent with memory, persistence, structured output, and session resumability.
Workflow
- Initialize session for document tracking
- Configure SQLite persistence for extraction history
- Extract text with structured output
- Store results in memory for search
- Resume session for document comparison
Setup
pip install praisonaiagents praisonai pydantic
export OPENAI_API_KEY="your-key"
Run — Python
from praisonaiagents import Agent, Task, AgentTeam, Session
from pydantic import BaseModel
class ExtractedDocument(BaseModel):
filename: str
text: str
sections: list[str]
word_count: int
session = Session(session_id="ocr-001", user_id="user-1")
agent = Agent(
name="OCRAgent",
instructions="Extract text and return structured results.",
llm="gpt-4o-mini",
memory=True
)
task = Task(
description="Extract all text from this document",
expected_output="Structured extraction",
agent=agent,
images=["document.jpg"],
output_pydantic=ExtractedDocument
)
agents = AgentTeam(
agents=[agent],
tasks=[task],
memory=True
)
result = agents.start()
print(result)
Run — CLI
praisonai "Extract text" --image document.jpg --memory --verbose
Run — agents.yaml
framework: praisonai
topic: Text Extraction
memory: true
memory_config:
provider: sqlite
db_path: ocr.db
roles:
ocr_agent:
role: OCR Specialist
goal: Extract text with structured output
backstory: You are an expert in text extraction
llm: gpt-4o-mini
memory: true
tasks:
extract:
description: Extract all text from this document
expected_output: Structured extraction
images:
- document.jpg
output_json:
filename: string
text: string
sections: array
word_count: number
praisonai agents.yaml --verbose
Serve API
from praisonaiagents import Agent
agent = Agent(
name="OCRAgent",
instructions="Extract text and return structured results.",
llm="gpt-4o-mini",
memory=True
)
agent.launch(port=8080)
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{"message": "Extract text", "session_id": "ocr-001"}'
Monitor / Verify
praisonai "test ocr" --image test.jpg --verbose
Cleanup
Features Demonstrated
| Feature | Implementation |
|---|
| Workflow | Vision-based text extraction |
| DB Persistence | SQLite via memory_config |
| Observability | --verbose flag |
| Resumability | Session with session_id |
| Structured Output | Pydantic ExtractedDocument model |
Next Steps