> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Multi-Modal Agent

> Build agents that work with images, PDFs, and files

# Multi-Modal Agent

Build agents that can process and understand images, PDFs, audio, and other file types.

## Quick Start

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { Agent, createImagePart, createFilePart } from 'praisonai-ts';

const agent = new Agent({
  name: 'VisionAgent',
  instructions: 'You analyze images and documents.',
  model: 'gpt-4o', // Vision-capable model
});

// Analyze an image
const response = await agent.chat([
  { type: 'text', text: 'What do you see in this image?' },
  createImagePart('https://example.com/image.jpg'),
]);

console.log(response);
```

## Image Analysis

### From URL

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { createImagePart } from 'praisonai-ts';

const response = await agent.chat([
  { type: 'text', text: 'Describe this image' },
  createImagePart('https://example.com/photo.jpg'),
]);
```

### From Base64

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { createImagePart } from 'praisonai-ts';
import fs from 'fs';

const imageData = fs.readFileSync('./image.png');
const base64 = imageData.toString('base64');

const response = await agent.chat([
  { type: 'text', text: 'What is in this image?' },
  createImagePart(`data:image/png;base64,${base64}`),
]);
```

### From File Path

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { createImagePart } from 'praisonai-ts';

const response = await agent.chat([
  { type: 'text', text: 'Analyze this screenshot' },
  createImagePart('./screenshot.png'), // Local file path
]);
```

## PDF Processing

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { Agent, createPdfPart } from 'praisonai-ts';

const agent = new Agent({
  name: 'DocumentAgent',
  instructions: 'You analyze PDF documents.',
  model: 'gpt-4o',
});

const response = await agent.chat([
  { type: 'text', text: 'Summarize this document' },
  createPdfPart('./report.pdf'),
]);
```

## File Attachments

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { createFilePart } from 'praisonai-ts';

// Text file
const response = await agent.chat([
  { type: 'text', text: 'Review this code' },
  createFilePart('./code.ts', 'text/typescript'),
]);

// CSV data
const response2 = await agent.chat([
  { type: 'text', text: 'Analyze this data' },
  createFilePart('./data.csv', 'text/csv'),
]);
```

## Multi-Modal Messages

Combine multiple content types in a single message:

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { createMultimodalMessage } from 'praisonai-ts';

const message = createMultimodalMessage([
  { type: 'text', text: 'Compare these two images:' },
  { type: 'image', url: 'https://example.com/image1.jpg' },
  { type: 'image', url: 'https://example.com/image2.jpg' },
]);

const response = await agent.chat(message);
```

## Image Generation

Generate images with DALL-E or other models:

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { aiGenerateImage } from 'praisonai-ts';

const result = await aiGenerateImage({
  model: 'dall-e-3',
  prompt: 'A futuristic city with flying cars',
  size: '1024x1024',
  quality: 'hd',
});

console.log(result.images[0].url);
```

### With Agent

```typescript theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
import { ImageAgent, createImageAgent } from 'praisonai-ts';

const imageAgent = createImageAgent({
  model: 'dall-e-3',
  defaultSize: '1024x1024',
});

const result = await imageAgent.generate('A sunset over mountains');
console.log(result.url);
```

## Supported Models

| Model               | Provider  | Capabilities        |
| ------------------- | --------- | ------------------- |
| `gpt-4o`            | OpenAI    | Vision, Text        |
| `gpt-4o-mini`       | OpenAI    | Vision, Text        |
| `claude-3.5-sonnet` | Anthropic | Vision, Text, PDFs  |
| `claude-3-opus`     | Anthropic | Vision, Text, PDFs  |
| `gemini-1.5-pro`    | Google    | Vision, Text, Video |
| `gemini-1.5-flash`  | Google    | Vision, Text        |

## Best Practices

1. **Use appropriate models** - Not all models support vision
2. **Optimize image size** - Resize large images to reduce tokens
3. **Be specific** - Provide clear instructions for image analysis
4. **Handle errors** - Some images may fail to process

## Environment Variables

| Variable            | Required   | Description       |
| ------------------- | ---------- | ----------------- |
| `OPENAI_API_KEY`    | Yes        | For GPT-4o vision |
| `ANTHROPIC_API_KEY` | For Claude | Claude vision     |
| `GOOGLE_API_KEY`    | For Gemini | Gemini vision     |

## Related

* [Image Agent](/docs/js/image-agent) - Dedicated image agent
* [Generate Image](/docs/js/tools/generate-image) - Image generation
