OCR - PraisonAI

Agents can read text from images - receipts, documents, signs, and screenshots.

Quick Start

Extract Text from Image

import { Agent } from 'praisonai';

const agent = new Agent({
  instructions: 'Extract all text from images',
  llm: 'gpt-4o'  // Vision-capable model
});

await agent.chat([
  { role: 'user', content: [
    { type: 'text', text: 'What text is on this receipt?' },
    { type: 'image', url: 'https://example.com/receipt.jpg' }
  ]}
]);
// "Store: Coffee Shop, Total: $4.50..."

Extract Structured Data

const agent = new Agent({
  instructions: 'Extract data as JSON',
  outputFormat: 'json'
});

await agent.chat([
  { role: 'user', content: [
    { type: 'text', text: 'Extract items and prices from this receipt' },
    { type: 'image', path: './receipt.jpg' }
  ]}
]);
// { items: [...], total: 4.50, date: '...' }

User Interaction Flow

Configuration Levels

// Level 1: Bool - Enable with vision model
const agent = new Agent({
  llm: 'gpt-4o',
  vision: true
});

// Level 2: String - High detail for small text
const agent = new Agent({
  llm: 'gpt-4o',
  vision: 'high'
});

// Level 3: Dict - Full options
const agent = new Agent({
  vision: {
    detail: 'high',
    ocr: true,
    language: 'en'
  }
});

Common Uses

Use Case	Example
Receipts	Extract items, totals, dates
Business cards	Get contact information
Documents	Digitize scanned papers
Screenshots	Read displayed text

API Reference

OCRConfig

Complete configuration options

OCRAgent

Full class documentation

Best Practices

Use high detail for small text

Set detail: 'high' when reading receipts or documents.

Be specific about what to extract

“Extract the total and date” works better than “read this”.

Use good quality images

Clear, well-lit images produce better results.

Vision

Image analysis

Files

File operations

JavaScript

​Quick Start

​User Interaction Flow

​Configuration Levels

​Common Uses

​API Reference