Multi-Modal Agent

Build agents that can process and understand images, PDFs, audio, and other file types.

Quick Start

import { Agent, createImagePart, createFilePart } from 'praisonai-ts';

const agent = new Agent({
  name: 'VisionAgent',
  instructions: 'You analyze images and documents.',
  model: 'gpt-4o', // Vision-capable model
});

// Analyze an image
const response = await agent.chat([
  { type: 'text', text: 'What do you see in this image?' },
  createImagePart('https://example.com/image.jpg'),
]);

console.log(response);

Image Analysis

From URL

import { createImagePart } from 'praisonai-ts';

const response = await agent.chat([
  { type: 'text', text: 'Describe this image' },
  createImagePart('https://example.com/photo.jpg'),
]);

From Base64

import { createImagePart } from 'praisonai-ts';
import fs from 'fs';

const imageData = fs.readFileSync('./image.png');
const base64 = imageData.toString('base64');

const response = await agent.chat([
  { type: 'text', text: 'What is in this image?' },
  createImagePart(`data:image/png;base64,${base64}`),
]);

From File Path

import { createImagePart } from 'praisonai-ts';

const response = await agent.chat([
  { type: 'text', text: 'Analyze this screenshot' },
  createImagePart('./screenshot.png'), // Local file path
]);

PDF Processing

import { Agent, createPdfPart } from 'praisonai-ts';

const agent = new Agent({
  name: 'DocumentAgent',
  instructions: 'You analyze PDF documents.',
  model: 'gpt-4o',
});

const response = await agent.chat([
  { type: 'text', text: 'Summarize this document' },
  createPdfPart('./report.pdf'),
]);

File Attachments

import { createFilePart } from 'praisonai-ts';

// Text file
const response = await agent.chat([
  { type: 'text', text: 'Review this code' },
  createFilePart('./code.ts', 'text/typescript'),
]);

// CSV data
const response2 = await agent.chat([
  { type: 'text', text: 'Analyze this data' },
  createFilePart('./data.csv', 'text/csv'),
]);

Combine multiple content types in a single message:

import { createMultimodalMessage } from 'praisonai-ts';

const message = createMultimodalMessage([
  { type: 'text', text: 'Compare these two images:' },
  { type: 'image', url: 'https://example.com/image1.jpg' },
  { type: 'image', url: 'https://example.com/image2.jpg' },
]);

const response = await agent.chat(message);

Image Generation

Generate images with DALL-E or other models:

import { aiGenerateImage } from 'praisonai-ts';

const result = await aiGenerateImage({
  model: 'dall-e-3',
  prompt: 'A futuristic city with flying cars',
  size: '1024x1024',
  quality: 'hd',
});

console.log(result.images[0].url);

With Agent

import { ImageAgent, createImageAgent } from 'praisonai-ts';

const imageAgent = createImageAgent({
  model: 'dall-e-3',
  defaultSize: '1024x1024',
});

const result = await imageAgent.generate('A sunset over mountains');
console.log(result.url);

Supported Models

Model	Provider	Capabilities
`gpt-4o`	OpenAI	Vision, Text
`gpt-4o-mini`	OpenAI	Vision, Text
`claude-3.5-sonnet`	Anthropic	Vision, Text, PDFs
`claude-3-opus`	Anthropic	Vision, Text, PDFs
`gemini-1.5-pro`	Google	Vision, Text, Video
`gemini-1.5-flash`	Google	Vision, Text

Best Practices

Use appropriate models - Not all models support vision
Optimize image size - Resize large images to reduce tokens
Be specific - Provide clear instructions for image analysis
Handle errors - Some images may fail to process

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes	For GPT-4o vision
`ANTHROPIC_API_KEY`	For Claude	Claude vision
`GOOGLE_API_KEY`	For Gemini	Gemini vision

Image Agent - Dedicated image agent
Generate Image - Image generation

JavaScript

​Multi-Modal Agent

​Quick Start

​Image Analysis

​From URL

​From Base64

​From File Path

​PDF Processing

​File Attachments

​Multi-Modal Messages

​Image Generation

​With Agent

​Supported Models

​Best Practices

​Environment Variables

​Related

Quick Start

Image Analysis

From URL

From Base64

From File Path

PDF Processing

File Attachments

Multi-Modal Messages

Image Generation

With Agent

Supported Models

Best Practices

Environment Variables

Related