Quick Start
User Interaction Flow
Configuration Levels
What You Can Do
| Task | Example |
|---|---|
| Describe images | ”What is in this photo?” |
| Read text (OCR) | “What does the sign say?” |
| Compare images | ”What changed between these?” |
| Identify objects | ”List everything you see” |
API Reference
VisionConfig
Complete configuration options
VisionAgent
Full class documentation
Best Practices
Use vision-capable models
Use vision-capable models
Use GPT-4o, Claude 3, or Gemini Pro Vision for image analysis.
Be specific in questions
Be specific in questions
“What text is on the document?” works better than “What is this?”
Use high detail for text
Use high detail for text
Set
detail: 'high' when reading small text or documents.
