Documentation Index
Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
Use this file to discover all available pages before exploring further.
Multi-Modal Agent
Build agents that can process and understand images, PDFs, audio, and other file types.
Quick Start
import { Agent, createImagePart, createFilePart } from 'praisonai-ts';
const agent = new Agent({
name: 'VisionAgent',
instructions: 'You analyze images and documents.',
model: 'gpt-4o', // Vision-capable model
});
// Analyze an image
const response = await agent.chat([
{ type: 'text', text: 'What do you see in this image?' },
createImagePart('https://example.com/image.jpg'),
]);
console.log(response);
Image Analysis
From URL
import { createImagePart } from 'praisonai-ts';
const response = await agent.chat([
{ type: 'text', text: 'Describe this image' },
createImagePart('https://example.com/photo.jpg'),
]);
From Base64
import { createImagePart } from 'praisonai-ts';
import fs from 'fs';
const imageData = fs.readFileSync('./image.png');
const base64 = imageData.toString('base64');
const response = await agent.chat([
{ type: 'text', text: 'What is in this image?' },
createImagePart(`data:image/png;base64,${base64}`),
]);
From File Path
import { createImagePart } from 'praisonai-ts';
const response = await agent.chat([
{ type: 'text', text: 'Analyze this screenshot' },
createImagePart('./screenshot.png'), // Local file path
]);
PDF Processing
import { Agent, createPdfPart } from 'praisonai-ts';
const agent = new Agent({
name: 'DocumentAgent',
instructions: 'You analyze PDF documents.',
model: 'gpt-4o',
});
const response = await agent.chat([
{ type: 'text', text: 'Summarize this document' },
createPdfPart('./report.pdf'),
]);
File Attachments
import { createFilePart } from 'praisonai-ts';
// Text file
const response = await agent.chat([
{ type: 'text', text: 'Review this code' },
createFilePart('./code.ts', 'text/typescript'),
]);
// CSV data
const response2 = await agent.chat([
{ type: 'text', text: 'Analyze this data' },
createFilePart('./data.csv', 'text/csv'),
]);
Multi-Modal Messages
Combine multiple content types in a single message:
import { createMultimodalMessage } from 'praisonai-ts';
const message = createMultimodalMessage([
{ type: 'text', text: 'Compare these two images:' },
{ type: 'image', url: 'https://example.com/image1.jpg' },
{ type: 'image', url: 'https://example.com/image2.jpg' },
]);
const response = await agent.chat(message);
Image Generation
Generate images with DALL-E or other models:
import { aiGenerateImage } from 'praisonai-ts';
const result = await aiGenerateImage({
model: 'dall-e-3',
prompt: 'A futuristic city with flying cars',
size: '1024x1024',
quality: 'hd',
});
console.log(result.images[0].url);
With Agent
import { ImageAgent, createImageAgent } from 'praisonai-ts';
const imageAgent = createImageAgent({
model: 'dall-e-3',
defaultSize: '1024x1024',
});
const result = await imageAgent.generate('A sunset over mountains');
console.log(result.url);
Supported Models
| Model | Provider | Capabilities |
|---|
gpt-4o | OpenAI | Vision, Text |
gpt-4o-mini | OpenAI | Vision, Text |
claude-3.5-sonnet | Anthropic | Vision, Text, PDFs |
claude-3-opus | Anthropic | Vision, Text, PDFs |
gemini-1.5-pro | Google | Vision, Text, Video |
gemini-1.5-flash | Google | Vision, Text |
Best Practices
- Use appropriate models - Not all models support vision
- Optimize image size - Resize large images to reduce tokens
- Be specific - Provide clear instructions for image analysis
- Handle errors - Some images may fail to process
Environment Variables
| Variable | Required | Description |
|---|
OPENAI_API_KEY | Yes | For GPT-4o vision |
ANTHROPIC_API_KEY | For Claude | Claude vision |
GOOGLE_API_KEY | For Gemini | Gemini vision |