Browser Agent
Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.
Quick Start
1. Start the Bridge Server
praisonai browser start --port 8765 --model gpt-4o
2. Load the Chrome Extension
- Open
chrome://extensions
- Enable “Developer mode”
- Load unpacked extension from
praisonai-chrome-extension/dist
3. Use the Side Panel
Click the extension icon or press Ctrl+Shift+P to open the side panel. Enter your goal and the AI will control the browser.
Architecture
Flow: Chrome Extension ↔ WebSocket ↔ Bridge Server ↔ PraisonAI Agent
The system consists of:
- Chrome Extension: Captures page state and executes actions via CDP
- Bridge Server: FastAPI WebSocket server that routes messages to agents
- BrowserAgent: PraisonAI agent that decides actions based on observations
- SessionManager: SQLite-based persistence for session history
- Hybrid Mode: Falls back to on-device Gemini Nano if server unavailable
Session Flow
Session States
Smart Features
Click Fallbacks
When clicks fail, the agent automatically tries:
- Viewport click using
getBoundingClientRect() + scrollIntoView()
- JavaScript click via
element.click()
- Focus + Enter for buttons
Goal Context & Self-Correction
Every observation sent to the LLM includes:
- Original goal: Always visible to prevent drift
- Action history: Last 5 actions with success/failure status
- Progress notes: Summary of steps completed
Failure Communication
When actions fail, the LLM receives explicit feedback:
⛔ LAST ACTION FAILED!
Error: All click methods failed for: a.MV3Tnb
→ You MUST try a DIFFERENT approach!
This enables the agent to self-correct and find alternate paths.
CLI Commands
Run Browser Agent
Execute a goal directly from CLI with live progress display:
praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug # Show all WebSocket messages
Options:
--url, -u: Start URL (default: https://www.google.com)
--model, -m: LLM model (default: gpt-4o-mini)
--timeout, -t: Timeout in seconds (default: 120)
--debug, -d: Debug mode - show all events
Example Output:
🚀 Starting browser agent
Goal: Go to google and search praisonai
Model: gpt-4o-mini
Session: 4a703667
Step 0: ▶ TYPE → textarea#APjFqb
📍 https://www.google.com/
Step 1: ▶ CLICK
Step 2: ▶ CLICK
📍 https://www.google.com/search?q=praisonai
✅ Task completed!
Tab Management
praisonai browser tabs # List all tabs
praisonai browser tabs --new https://google.com # Open new tab
praisonai browser tabs --close TAB_ID # Close tab
praisonai browser tabs --focus TAB_ID # Focus tab
Navigate
praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID
Execute JavaScript
praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"
Page Inspection (New)
Inspect browser pages without the extension:
# List all open pages
praisonai browser pages
# Get DOM tree
praisonai browser dom <PAGE_ID>
# Read page content as text
praisonai browser content <PAGE_ID>
# Capture console logs
praisonai browser console <PAGE_ID>
# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"
These commands work via CDP (Chrome DevTools Protocol) and require Chrome
running with --remote-debugging-port=9222.
Automation Engines
Choose different execution engines with --engine:
# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension
# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp
# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright
| Engine | Extension | Headless | Multi-Browser |
|---|
| extension | Required | No | No |
| cdp | No | Yes | Chrome only |
| playwright | No | Yes | Chrome/Firefox/WebKit |
Screenshot
praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png
Start Server
praisonai browser start [OPTIONS]
Options:
--port, -p: Port to listen on (default: 8765)
--host, -H: Host to bind to (default: 0.0.0.0)
--model, -m: LLM model (default: gpt-4o-mini)
--max-steps: Maximum steps per session (default: 20)
--verbose, -v: Enable verbose logging
List Sessions
praisonai browser sessions [OPTIONS]
Options:
--status, -s: Filter by status (running, completed, failed)
--limit, -l: Maximum sessions to show
View History
praisonai browser history <SESSION_ID>
Clear Sessions
praisonai browser clear --status completed --yes
Reload Extension
Reload the Chrome extension after making changes:
praisonai browser reload
praisonai browser reload --port 9222 # Custom Chrome debug port
Health Diagnostics
Run health checks for the browser automation system:
praisonai browser doctor # Run all checks
praisonai browser doctor server # Check bridge server
praisonai browser doctor chrome # Check Chrome debugging
praisonai browser doctor extension # Check extension loaded
praisonai browser doctor db # Check session database
Example Output:
Browser Health Check
✅ Server: ok
Connections: 1
Sessions: 0
✅ Chrome: Chrome/131.0.6778.85
WebSocket: ws://127.0.0.1:9222/devtools/browser/...
✅ Extension loaded
URL: chrome-extension://fkmfdklcegbbpipbcimb...
✅ Session database
Path: ~/.praisonai/browser_sessions.db
Sessions: 42
Steps: 387
Python API
from praisonai.browser import BrowserServer, BrowserAgent
# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start() # Blocks
# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
"task": "Search for AI frameworks",
"url": "https://google.com",
"title": "Google",
"elements": [{"selector": "#search", "tag": "input", "text": ""}]
})
Session Management
from praisonai.browser.sessions import SessionManager
manager = SessionManager()
# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])
# List sessions
sessions = manager.list_sessions(status="running")
# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
print(f"Step {step['step_number']}: {step['action']}")
Hybrid Mode (Extension)
The Chrome Extension supports hybrid mode:
- Bridge Mode: Connect to PraisonAI server for cloud LLMs
- Built-in Mode: Use Chrome’s Gemini Nano on-device
If the bridge server is unavailable, it automatically falls back to built-in AI.
Keyboard Shortcuts
| Shortcut | Action |
|---|
Ctrl+Shift+P | Toggle side panel |
Alt+A | Start agent |
Alt+S | Capture screenshot |
Supported Actions
| Action | Description |
|---|
click | Click on element |
type | Enter text |
scroll | Scroll page |
navigate | Go to URL |
wait | Wait for page |
screenshot | Capture screen |
done | Task complete |
WebSocket Protocol
Connect to ws://localhost:8765/ws and send/receive JSON messages:
// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}
// Send observation
{"type": "observation", "session_id": "...", "task": "...",
"url": "...", "elements": [...]}
// Receive action
{"type": "action", "action": "click", "selector": "#search",
"thought": "Clicking search button"}
Environment Variables
| Variable | Description |
|---|
OPENAI_API_KEY | OpenAI API key for GPT models |
ANTHROPIC_API_KEY | Anthropic API key for Claude |
GOOGLE_API_KEY | Google API key for Gemini |