Skip to main content

Browser Agent

Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.

Quick Start

1. Start the Bridge Server

praisonai browser start --port 8765 --model gpt-4o

2. Load the Chrome Extension

  1. Open chrome://extensions
  2. Enable “Developer mode”
  3. Load unpacked extension from praisonai-chrome-extension/dist

3. Use the Side Panel

Click the extension icon or press Ctrl+Shift+P to open the side panel. Enter your goal and the AI will control the browser.

Architecture

Flow: Chrome Extension ↔ WebSocket ↔ Bridge Server ↔ PraisonAI Agent The system consists of:
  • Chrome Extension: Captures page state and executes actions via CDP
  • Bridge Server: FastAPI WebSocket server that routes messages to agents
  • BrowserAgent: PraisonAI agent that decides actions based on observations
  • SessionManager: SQLite-based persistence for session history
  • Hybrid Mode: Falls back to on-device Gemini Nano if server unavailable

Session Flow

Session States

Smart Features

Click Fallbacks

When clicks fail, the agent automatically tries:
  1. Viewport click using getBoundingClientRect() + scrollIntoView()
  2. JavaScript click via element.click()
  3. Focus + Enter for buttons

Goal Context & Self-Correction

Every observation sent to the LLM includes:
  • Original goal: Always visible to prevent drift
  • Action history: Last 5 actions with success/failure status
  • Progress notes: Summary of steps completed

Failure Communication

When actions fail, the LLM receives explicit feedback:
⛔ LAST ACTION FAILED!
   Error: All click methods failed for: a.MV3Tnb
   → You MUST try a DIFFERENT approach!
This enables the agent to self-correct and find alternate paths.

CLI Commands

Run Browser Agent

Execute a goal directly from CLI with live progress display:
praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug  # Show all WebSocket messages
Options:
  • --url, -u: Start URL (default: https://www.google.com)
  • --model, -m: LLM model (default: gpt-4o-mini)
  • --timeout, -t: Timeout in seconds (default: 120)
  • --debug, -d: Debug mode - show all events
Example Output:
🚀 Starting browser agent
   Goal: Go to google and search praisonai
   Model: gpt-4o-mini

Session: 4a703667

Step 0: ▶ TYPE → textarea#APjFqb
        📍 https://www.google.com/

Step 1: ▶ CLICK

Step 2: ▶ CLICK
        📍 https://www.google.com/search?q=praisonai

✅ Task completed!

Tab Management

praisonai browser tabs              # List all tabs
praisonai browser tabs --new https://google.com  # Open new tab
praisonai browser tabs --close TAB_ID    # Close tab
praisonai browser tabs --focus TAB_ID    # Focus tab
praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID

Execute JavaScript

praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"

Page Inspection (New)

Inspect browser pages without the extension:
# List all open pages
praisonai browser pages

# Get DOM tree
praisonai browser dom <PAGE_ID>

# Read page content as text
praisonai browser content <PAGE_ID>

# Capture console logs
praisonai browser console <PAGE_ID>

# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"
These commands work via CDP (Chrome DevTools Protocol) and require Chrome running with --remote-debugging-port=9222.

Automation Engines

Choose different execution engines with --engine:
# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension

# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp

# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright
EngineExtensionHeadlessMulti-Browser
extensionRequiredNoNo
cdpNoYesChrome only
playwrightNoYesChrome/Firefox/WebKit

Screenshot

praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png

Start Server

praisonai browser start [OPTIONS]
Options:
  • --port, -p: Port to listen on (default: 8765)
  • --host, -H: Host to bind to (default: 0.0.0.0)
  • --model, -m: LLM model (default: gpt-4o-mini)
  • --max-steps: Maximum steps per session (default: 20)
  • --verbose, -v: Enable verbose logging

List Sessions

praisonai browser sessions [OPTIONS]
Options:
  • --status, -s: Filter by status (running, completed, failed)
  • --limit, -l: Maximum sessions to show

View History

praisonai browser history <SESSION_ID>

Clear Sessions

praisonai browser clear --status completed --yes

Reload Extension

Reload the Chrome extension after making changes:
praisonai browser reload
praisonai browser reload --port 9222  # Custom Chrome debug port

Health Diagnostics

Run health checks for the browser automation system:
praisonai browser doctor          # Run all checks
praisonai browser doctor server   # Check bridge server
praisonai browser doctor chrome   # Check Chrome debugging
praisonai browser doctor extension  # Check extension loaded
praisonai browser doctor db       # Check session database
Example Output:
Browser Health Check

✅ Server: ok
   Connections: 1
   Sessions: 0

✅ Chrome: Chrome/131.0.6778.85
   WebSocket: ws://127.0.0.1:9222/devtools/browser/...

✅ Extension loaded
   URL: chrome-extension://fkmfdklcegbbpipbcimb...

✅ Session database
   Path: ~/.praisonai/browser_sessions.db
   Sessions: 42
   Steps: 387

Python API

from praisonai.browser import BrowserServer, BrowserAgent

# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start()  # Blocks

# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
    "task": "Search for AI frameworks",
    "url": "https://google.com",
    "title": "Google",
    "elements": [{"selector": "#search", "tag": "input", "text": ""}]
})

Session Management

from praisonai.browser.sessions import SessionManager

manager = SessionManager()

# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])

# List sessions
sessions = manager.list_sessions(status="running")

# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
    print(f"Step {step['step_number']}: {step['action']}")

Hybrid Mode (Extension)

The Chrome Extension supports hybrid mode:
  1. Bridge Mode: Connect to PraisonAI server for cloud LLMs
  2. Built-in Mode: Use Chrome’s Gemini Nano on-device
If the bridge server is unavailable, it automatically falls back to built-in AI.

Keyboard Shortcuts

ShortcutAction
Ctrl+Shift+PToggle side panel
Alt+AStart agent
Alt+SCapture screenshot

Supported Actions

ActionDescription
clickClick on element
typeEnter text
scrollScroll page
navigateGo to URL
waitWait for page
screenshotCapture screen
doneTask complete

WebSocket Protocol

Connect to ws://localhost:8765/ws and send/receive JSON messages:
// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}

// Send observation
{"type": "observation", "session_id": "...", "task": "...", 
 "url": "...", "elements": [...]}

// Receive action
{"type": "action", "action": "click", "selector": "#search", 
 "thought": "Clicking search button"}

Environment Variables

VariableDescription
OPENAI_API_KEYOpenAI API key for GPT models
ANTHROPIC_API_KEYAnthropic API key for Claude
GOOGLE_API_KEYGoogle API key for Gemini