Skip to main content
Extract text from PDFs and images with OCRAgent — pass a URL or base64 source and get markdown-ready text back.
Source must be a URL (https://) or base64-encoded document. Local file paths are not supported. Currently only Mistral (mistral/mistral-ocr-latest) is supported.

Quick Start

1

Extract text from a PDF

from praisonaiagents import Agent, OCRAgent

ocr = OCRAgent()
text = ocr.read("https://arxiv.org/pdf/2201.04234")

agent = Agent(name="Reader", instructions="Summarise documents clearly.")
summary = agent.start(f"Summarise this paper:\n\n{text[:4000]}")
2

Configure with OCRConfig

import os
from praisonaiagents import OCRAgent, OCRConfig

config = OCRConfig(
    pages=[0, 1],
    timeout=300,
    api_key=os.getenv("MISTRAL_API_KEY"),
)

ocr = OCRAgent(ocr=config)
result = ocr.extract("https://arxiv.org/pdf/2201.04234")

for page in result.pages:
    print(f"Page {page.index}: {page.markdown[:100]}")
3

Async extraction

import asyncio
from praisonaiagents import OCRAgent

async def main():
    ocr = OCRAgent()
    text = await ocr.aread("https://example.com/screenshot.png")
    print(text)

asyncio.run(main())

How It Works

MethodReturnsUse when
read / areadstr (markdown)You only need plain text
extract / aextractFull result with pagesYou need per-page markdown or metadata

Configuration Options

OCRAgent

Agent class reference

OCRConfig

Configuration dataclass
OptionTypeDefaultDescription
include_image_base64boolFalseInclude base64-encoded image bytes in the result
pagesOptional[List[int]]NoneSpecific page indexes to extract (0-indexed)
image_limitOptional[int]NoneMax images to process
timeoutint600Request timeout in seconds
api_baseOptional[str]NoneOverride provider base URL
api_keyOptional[str]NoneOverride provider API key

Common Patterns

from praisonaiagents import OCRAgent

ocr = OCRAgent()
result = ocr.extract("https://example.com/large.pdf", pages=[0, 1, 2])
print(result.pages[0].markdown)

Providers

Mistral OCR

Provider setup and model options

Best Practices

Local file paths are not supported — upload to a reachable URL or encode as base64 before calling OCRAgent.
Use pages=[0, 1, 2] via OCRConfig or method kwargs to limit cost and latency on multi-hundred-page documents.
Default timeout is 600 seconds. Lower it for quick image OCR; raise it for large scanned PDFs.
Pass api_key on OCRConfig, on OCRAgent(...), or set MISTRAL_API_KEY in the environment — instance config wins over env vars.

Knowledge

Index extracted text for retrieval

Tools

Give agents document-processing tools