ImageAgent
TheImageAgent class is a specialized agent designed for AI image generation tasks. It extends the base Agent class and provides seamless integration with various image generation models and providers.
Overview
ImageAgent simplifies the process of generating images using AI models by providing a unified interface for multiple image generation providers including OpenAI DALL-E, Stability AI, and other compatible models through the litellm library.
Basic Usage
Configuration Options
Core Parameters
- llm (str): The image generation model to use (e.g., “dall-e-3”, “dall-e-2”, “stable-diffusion-v1-6-0”)
- api_key (str, optional): API key for the image generation service
- style (str, optional): Style of the generated image (default: “natural”). Note: This is passed to the underlying generation config
- response_format (str, optional): Format for the response (“url” or “b64_json”, default: “url”)
- timeout (int, optional): Timeout for image generation requests in seconds (default: 600)
- api_version (str, optional): Optional API version (required for Azure dall-e-3)
Inherited Parameters
All parameters from the baseAgent class are also available:
- role, goal, backstory, instructions
- tools, llm, max_iter, max_retry
- verbose, cache, markdown
Supported Models
ImageAgent supports various image generation models through litellm:- OpenAI: dall-e-3, dall-e-2
- Stability AI: stable-diffusion-v1-6-0, stable-diffusion-xl-1024-v1-0
- Other providers: Any model supported by litellm’s image generation capabilities
Advanced Features
Custom Styling
Async Image Generation
Integration with Other Agents
Error Handling
Best Practices
-
Model Selection: Choose the appropriate model based on your needs:
dall-e-3: Best quality and prompt adherencedall-e-2: Faster and more cost-effective- Stable Diffusion variants: Open-source alternatives
-
Prompt Engineering: Provide detailed, descriptive prompts for better results:
-
Response Format: Use URL format for web applications, base64 for direct processing:
-
API Key Management: Store API keys securely using environment variables:
Limitations
- Image generation can be slow (10-30 seconds depending on the model)
- Some models have content policy restrictions
- Generation costs vary by model and provider
- Output resolution depends on the model capabilities
See Also
- Agent - Base agent class
- LLM Configuration - Configure LLM providers
- Model Capabilities - Model feature comparison

