Skip to main content
Agents can read text from images - receipts, documents, signs, and screenshots.

Quick Start

1

Extract Text from Image

import { Agent } from 'praisonai';

const agent = new Agent({
  instructions: 'Extract all text from images',
  llm: 'gpt-4o'  // Vision-capable model
});

await agent.chat([
  { role: 'user', content: [
    { type: 'text', text: 'What text is on this receipt?' },
    { type: 'image', url: 'https://example.com/receipt.jpg' }
  ]}
]);
// "Store: Coffee Shop, Total: $4.50..."
2

Extract Structured Data

const agent = new Agent({
  instructions: 'Extract data as JSON',
  outputFormat: 'json'
});

await agent.chat([
  { role: 'user', content: [
    { type: 'text', text: 'Extract items and prices from this receipt' },
    { type: 'image', path: './receipt.jpg' }
  ]}
]);
// { items: [...], total: 4.50, date: '...' }

User Interaction Flow


Configuration Levels

// Level 1: Bool - Enable with vision model
const agent = new Agent({
  llm: 'gpt-4o',
  vision: true
});

// Level 2: String - High detail for small text
const agent = new Agent({
  llm: 'gpt-4o',
  vision: 'high'
});

// Level 3: Dict - Full options
const agent = new Agent({
  vision: {
    detail: 'high',
    ocr: true,
    language: 'en'
  }
});

Common Uses

Use CaseExample
ReceiptsExtract items, totals, dates
Business cardsGet contact information
DocumentsDigitize scanned papers
ScreenshotsRead displayed text

API Reference

OCRConfig

Complete configuration options

OCRAgent

Full class documentation

Best Practices

Set detail: 'high' when reading receipts or documents.
“Extract the total and date” works better than “read this”.
Clear, well-lit images produce better results.