> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Image to Text Agent

> Learn how to create AI agents for converting images to textual descriptions and extracting text from images.

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
flowchart LR
    In[Image] --> Agent[OCR Agent]
    Agent --> Out[Extracted Text]
    
    style In fill:#8B0000,color:#fff
    style Agent fill:#2E8B57,color:#fff
    style Out fill:#8B0000,color:#fff
```

OCR and text extraction agent using vision models.

***

## Simple

**Agents: 1** — Single agent with vision capabilities extracts text from images.

### Workflow

1. Receive image with text
2. Process with vision model
3. Extract and return text content

### Setup

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
pip install praisonaiagents praisonai
export OPENAI_API_KEY="your-key"
```

### Run — Python

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import Agent, Task, AgentTeam

agent = Agent(
    name="OCRAgent",
    instructions="Extract all text from images preserving layout.",
    llm="gpt-4o-mini"
)

task = Task(
    description="Extract all text from this document",
    expected_output="Extracted text",
    agent=agent,
    images=["document.jpg"]
)

agents = AgentTeam(agents=[agent], tasks=[task])
result = agents.start()
print(result)
```

### Run — CLI

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai "Extract text from this document" --image document.jpg
```

### Run — agents.yaml

```yaml theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
framework: praisonai
topic: Text Extraction
roles:
  ocr_agent:
    role: OCR Specialist
    goal: Extract text from images
    backstory: You are an expert in text extraction
    llm: gpt-4o-mini
    tasks:
      extract:
        description: Extract all text from this document
        expected_output: Extracted text
        images:
          - document.jpg
```

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai agents.yaml
```

### Serve API

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import Agent

agent = Agent(
    name="OCRAgent",
    instructions="You are an OCR expert.",
    llm="gpt-4o-mini"
)

agent.launch(port=8080)
```

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Extract text from: https://example.com/doc.jpg"}'
```

***

## Advanced Workflow (All Features)

**Agents: 1** — Single agent with memory, persistence, structured output, and session resumability.

### Workflow

1. Initialize session for document tracking
2. Configure SQLite persistence for extraction history
3. Extract text with structured output
4. Store results in memory for search
5. Resume session for document comparison

### Setup

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
pip install praisonaiagents praisonai pydantic
export OPENAI_API_KEY="your-key"
```

### Run — Python

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import Agent, Task, AgentTeam, Session
from pydantic import BaseModel

class ExtractedDocument(BaseModel):
    filename: str
    text: str
    sections: list[str]
    word_count: int

session = Session(session_id="ocr-001", user_id="user-1")

agent = Agent(
    name="OCRAgent",
    instructions="Extract text and return structured results.",
    llm="gpt-4o-mini",
    memory=True
)

task = Task(
    description="Extract all text from this document",
    expected_output="Structured extraction",
    agent=agent,
    images=["document.jpg"],
    output_pydantic=ExtractedDocument
)

agents = AgentTeam(
    agents=[agent],
    tasks=[task],
    memory=True
)

result = agents.start()
print(result)
```

### Run — CLI

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai "Extract text" --image document.jpg --memory --verbose
```

### Run — agents.yaml

```yaml theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
framework: praisonai
topic: Text Extraction
memory: true
memory_config:
  provider: sqlite
  db_path: ocr.db
roles:
  ocr_agent:
    role: OCR Specialist
    goal: Extract text with structured output
    backstory: You are an expert in text extraction
    llm: gpt-4o-mini
    memory: true
    tasks:
      extract:
        description: Extract all text from this document
        expected_output: Structured extraction
        images:
          - document.jpg
        output_json:
          filename: string
          text: string
          sections: array
          word_count: number
```

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai agents.yaml --verbose
```

### Serve API

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonaiagents import Agent

agent = Agent(
    name="OCRAgent",
    instructions="Extract text and return structured results.",
    llm="gpt-4o-mini",
    memory=True
)

agent.launch(port=8080)
```

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Extract text", "session_id": "ocr-001"}'
```

***

## Monitor / Verify

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai "test ocr" --image test.jpg --verbose
```

## Cleanup

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
rm -f ocr.db
```

## Features Demonstrated

| Feature           | Implementation                     |
| ----------------- | ---------------------------------- |
| Workflow          | Vision-based text extraction       |
| DB Persistence    | SQLite via `memory_config`         |
| Observability     | `--verbose` flag                   |
| Resumability      | `Session` with `session_id`        |
| Structured Output | Pydantic `ExtractedDocument` model |

## Next Steps

* [Image Agent](/agents/image) for image analysis
* [Video Agent](/agents/video) for video content
* [Memory](/features/advanced-memory) for persistent context
