A workflow demonstrating how the Image-to-Text Agent can extract text from images and generate comprehensive descriptions.

Quick Start

1

Install Package

First, install the PraisonAI Agents package:

pip install praisonaiagents
2

Set API Key

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY=your_api_key_here
3

Create Script

Create a new file image_to_text.py:

from praisonaiagents import Agent, Task, PraisonAIAgents

# Create Image-to-Text Agent
image_text_agent = Agent(
    name="ImageTextConverter",
    role="Image Text Extraction Specialist",
    goal="Convert image content to textual descriptions and extract text",
    backstory="""You are an expert in OCR and image understanding.
    You excel at extracting text from images and generating detailed descriptions.""",
    llm="gpt-4o-mini",
    self_reflect=False
)

# Create text extraction task
extraction_task = Task(
    name="extract_text",
    description="Extract all text from this image and describe its layout.",
    expected_output="Extracted text and layout description",
    agent=image_text_agent,
    images=["document.jpg"]
)

# Create description task
description_task = Task(
    name="generate_description",
    description="Generate a detailed description of the image content.",
    expected_output="Comprehensive description of visual elements",
    agent=image_text_agent,
    images=["scene.jpg"]
)

# Create PraisonAIAgents instance
agents = PraisonAIAgents(
    agents=[image_text_agent],
    tasks=[extraction_task, description_task],
    process="sequential",
    verbose=1
)

# Run analysis
agents.start()

Understanding Image-to-Text Conversion

The Image-to-Text Agent combines multiple capabilities to convert visual content into textual form:

  1. OCR Processing: Extracts text from images using optical character recognition
  2. Layout Analysis: Understands the spatial arrangement of text and visual elements
  3. Content Description: Generates natural language descriptions of image content
  4. Text Formatting: Preserves text formatting and structure where possible

Features

Text Extraction

Advanced OCR capabilities for text extraction.

Layout Understanding

Analysis of text and content layout.

Content Description

Detailed descriptions of visual content.

Format Preservation

Maintains text formatting and structure.

Example Usage

# Example: Processing a document image
document_task = Task(
    name="process_document",
    description="Extract text and analyze document layout",
    expected_output="Extracted text with layout information",
    agent=image_text_agent,
    images=["business_document.jpg"]
)

# Run single task
agents = PraisonAIAgents(
    agents=[image_text_agent],
    tasks=[document_task],
    process="sequential"
)
agents.start()

Next Steps

Was this page helpful?