Skip to main content

VisionAgent

Defined in the vision_agent module.
A specialized agent for image analysis and understanding. Provides:
  • Image analysis and description
  • Multi-image comparison
  • Text extraction from images
Supported Providers:
  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo
  • Anthropic: claude-3-5-sonnet-20241022, claude-3-opus-20240229
  • Google: gemini/gemini-1.5-pro, gemini/gemini-1.5-flash

Constructor

name
Optional
No description available.
instructions
Optional
No description available.
llm
Optional
No description available.
model
Optional
No description available.
base_url
Optional
No description available.
api_key
Optional
No description available.
vision
Optional
No description available.
verbose
Union
default:"True"
No description available.

Methods

Usage

from praisonaiagents import VisionAgent
    
    # Simple usage
    agent = VisionAgent()
    description = agent.describe("https://example.com/image.jpg")
    print(description)
    
    # Analyze with custom prompt
    result = agent.analyze(
        "https://example.com/chart.png",
        prompt="What data does this chart show?"
    )
    
    # Compare images
    comparison = agent.compare([
        "image1.jpg",
        "image2.jpg"
    ])
    
    # Extract text
    text = agent.extract_text("document.png")