Image to Text Agent - PraisonAI Documentation

A workflow demonstrating how the Image-to-Text Agent can extract text from images and generate comprehensive descriptions.

Quick Start

Install Package

First, install the PraisonAI Agents package:

pip install praisonaiagents

Set API Key

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY=your_api_key_here

Create Script

Create a new file image_to_text.py:

from praisonaiagents import Agent, Task, PraisonAIAgents

# Create Image-to-Text Agent
image_text_agent = Agent(
    name="ImageTextConverter",
    role="Image Text Extraction Specialist",
    goal="Convert image content to textual descriptions and extract text",
    backstory="""You are an expert in OCR and image understanding.
    You excel at extracting text from images and generating detailed descriptions.""",
    llm="gpt-4o-mini",
    self_reflect=False
)

# Create text extraction task
extraction_task = Task(
    name="extract_text",
    description="Extract all text from this image and describe its layout.",
    expected_output="Extracted text and layout description",
    agent=image_text_agent,
    images=["document.jpg"]
)

# Create description task
description_task = Task(
    name="generate_description",
    description="Generate a detailed description of the image content.",
    expected_output="Comprehensive description of visual elements",
    agent=image_text_agent,
    images=["scene.jpg"]
)

# Create PraisonAIAgents instance
agents = PraisonAIAgents(
    agents=[image_text_agent],
    tasks=[extraction_task, description_task],
    process="sequential",
    verbose=1
)

# Run analysis
agents.start()

Understanding Image-to-Text Conversion

The Image-to-Text Agent combines multiple capabilities to convert visual content into textual form:

OCR Processing: Extracts text from images using optical character recognition
Layout Analysis: Understands the spatial arrangement of text and visual elements
Content Description: Generates natural language descriptions of image content
Text Formatting: Preserves text formatting and structure where possible

Features

Text Extraction

Advanced OCR capabilities for text extraction.

Layout Understanding

Analysis of text and content layout.

Content Description

Detailed descriptions of visual content.

Format Preservation

Maintains text formatting and structure.

Example Usage

# Example: Processing a document image
document_task = Task(
    name="process_document",
    description="Extract text and analyze document layout",
    expected_output="Extracted text with layout information",
    agent=image_text_agent,
    images=["business_document.jpg"]
)

# Run single task
agents = PraisonAIAgents(
    agents=[image_text_agent],
    tasks=[document_task],
    process="sequential"
)
agents.start()

Next Steps

Learn about Prompt Chaining for complex document processing
Explore Evaluator Optimizer for improving text extraction accuracy
Check out the Image Agent for pure image analysis capabilities

Agents

​Quick Start

​Understanding Image-to-Text Conversion

​Features