Skip to main content

ImageAgent

The ImageAgent class is a specialized agent designed for AI image generation tasks. It extends the base Agent class and provides seamless integration with various image generation models and providers.

Overview

ImageAgent simplifies the process of generating images using AI models by providing a unified interface for multiple image generation providers including OpenAI DALL-E, Stability AI, and other compatible models through the litellm library.

Basic Usage

from praisonaiagents import ImageAgent

# Create an image generation agent
artist = ImageAgent(
    role="Digital Artist",
    goal="Create stunning AI-generated artwork based on prompts",
    backstory="You are a creative AI artist capable of generating beautiful images.",
    llm="dall-e-3"  # or other supported models
)

# Generate an image
result = artist.chat("Create a serene landscape with mountains at sunset")
print(result)  # Returns the generated image URL or base64 data

Configuration Options

Core Parameters

  • llm (str): The image generation model to use (e.g., “dall-e-3”, “dall-e-2”, “stable-diffusion-v1-6-0”)
  • api_key (str, optional): API key for the image generation service
  • style (str, optional): Style of the generated image (default: “natural”). Note: This is passed to the underlying generation config
  • response_format (str, optional): Format for the response (“url” or “b64_json”, default: “url”)
  • timeout (int, optional): Timeout for image generation requests in seconds (default: 600)
  • api_version (str, optional): Optional API version (required for Azure dall-e-3)

Inherited Parameters

All parameters from the base Agent class are also available:
  • role, goal, backstory, instructions
  • tools, llm, max_iter, max_retry
  • verbose, cache, markdown

Supported Models

ImageAgent supports various image generation models through litellm:
  • OpenAI: dall-e-3, dall-e-2
  • Stability AI: stable-diffusion-v1-6-0, stable-diffusion-xl-1024-v1-0
  • Other providers: Any model supported by litellm’s image generation capabilities

Advanced Features

Custom Styling

# Create an agent with specific style preferences
artist = ImageAgent(
    role="Style-specific Artist",
    goal="Generate images in specific artistic styles",
    llm="dall-e-3",
    style="vivid",  # or "natural" for DALL-E 3
    response_format="url"
)

Async Image Generation

import asyncio

artist = ImageAgent(
    role="Async Artist",
    goal="Generate images asynchronously",
    llm="dall-e-3"
)

async def generate_images():
    # Generate multiple images concurrently
    tasks = [
        artist.achat("A futuristic city"),
        artist.achat("An underwater palace"),
        artist.achat("A magical forest")
    ]
    results = await asyncio.gather(*tasks)
    return results

# Run async generation
images = asyncio.run(generate_images())

Integration with Other Agents

from praisonaiagents import Agent, ImageAgent

# Create a writer agent
writer = Agent(
    role="Creative Writer",
    goal="Write creative descriptions for image generation",
    backstory="You excel at crafting detailed, imaginative prompts."
)

# Create an image agent
artist = ImageAgent(
    role="AI Artist",
    goal="Transform written descriptions into stunning visuals",
    llm="dall-e-3"
)

# Writer creates a prompt
prompt = writer.chat("Write a detailed description of a dreamlike fantasy landscape")

# Artist generates the image
image = artist.chat(prompt)

Error Handling

from praisonaiagents import ImageAgent

artist = ImageAgent(
    llm="dall-e-3",
    max_retry=3,  # Retry failed generations
    timeout=60    # 60 second timeout
)

try:
    result = artist.chat("Generate an image")
except Exception as e:
    print(f"Image generation failed: {e}")

Best Practices

  1. Model Selection: Choose the appropriate model based on your needs:
    • dall-e-3: Best quality and prompt adherence
    • dall-e-2: Faster and more cost-effective
    • Stable Diffusion variants: Open-source alternatives
  2. Prompt Engineering: Provide detailed, descriptive prompts for better results:
    # Good prompt
    "A serene Japanese garden with cherry blossoms, koi pond, and traditional bridge at sunset"
    
    # Less effective prompt
    "A garden"
    
  3. Response Format: Use URL format for web applications, base64 for direct processing:
    # For web display
    artist = ImageAgent(llm="dall-e-3", response_format="url")
    
    # For image processing
    artist = ImageAgent(llm="dall-e-3", response_format="b64_json")
    
  4. API Key Management: Store API keys securely using environment variables:
    import os
    
    artist = ImageAgent(
        llm="dall-e-3",
        api_key=os.getenv("OPENAI_API_KEY")
    )
    

Limitations

  • Image generation can be slow (10-30 seconds depending on the model)
  • Some models have content policy restrictions
  • Generation costs vary by model and provider
  • Output resolution depends on the model capabilities

See Also