Browser Agent Deep Dive
This guide explains how PraisonAI browser automation works under the hood, covering all execution modes, APIs, and integration patterns.
Architecture Overview
Instruction Flow (Extension Mode)
This diagram shows exactly how your instruction flows through the system:
Key Points
- User → CLI: User provides goal via
praisonai browser run "goal"
- CLI → Server: CLI connects over WebSocket, sends
start_session
- Server → Extension: Server forwards
start_automation to Chrome extension
- Extension → Page: Extension uses CDP to capture page state (DOM, screenshot)
- Extension → Server: Sends observation with page details
- Server → Agent: Passes observation to BrowserAgent
- Agent → LLM: Agent builds prompt with goal + context, gets action from LLM
- Server → Extension: Forwards action (click, type, scroll, etc.)
- Extension → Page: Executes action via CDP
- Loop: Repeats until agent returns
done: true or timeout
Execution Modes
Comparison Table
| Feature | Extension Mode | CDP Mode | Playwright Mode |
|---|
| Requires Extension | ✅ Yes | ❌ No | ❌ No |
| Headless Support | ❌ No | ✅ Yes | ✅ Yes |
| Multi-Browser | Chrome only | Chrome only | Chrome, Firefox, WebKit |
| Session Limit | 1 at a time | Unlimited | Unlimited |
| API Used | chrome.debugger | CDP WebSocket | Playwright API |
| Best For | Interactive use | Headless automation | Cross-browser testing |
| Extra Dependencies | None | aiohttp, websockets | playwright |
Extension Mode (Default)
How It Works
Backend APIs
| API | Location | Purpose |
|---|
chrome.debugger.attach() | Extension | Attach to tab for control |
chrome.debugger.sendCommand() | Extension | Execute CDP commands |
Runtime.evaluate | CDP | Execute JavaScript |
Input.dispatchMouseEvent | CDP | Simulate clicks |
Input.dispatchKeyEvent | CDP | Simulate typing |
Page.captureScreenshot | CDP | Take screenshots |
CLI Usage
# Basic usage (extension mode is default)
praisonai browser run "Search for PraisonAI on Google"
# With options
praisonai browser run "task" \
--url https://google.com \
--model gpt-4o \
--timeout 120 \
--debug
Programmatic Usage
from praisonai.browser import BrowserServer, BrowserAgent
# Start server
server = BrowserServer(port=8765, model="gpt-4o-mini")
server.start_background() # Runs in thread
# Create agent (requires extension to be connected)
agent = BrowserAgent(model="gpt-4o-mini", session_id="my-session")
# Agent is called via WebSocket, not directly
# Extension sends observations → Server calls agent → Extension executes actions
Limitations
Extension mode supports only one session at a time because Chrome’s
chrome.debugger API only allows one debugger attachment per tab.
CDP Mode
How It Works
Backend APIs
| API | Purpose | Example |
|---|
GET /json | List all pages | http://localhost:9222/json |
DOM.getDocument | Get DOM tree | {"depth": 4} |
DOM.querySelector | Find element | {"selector": "#search"} |
Runtime.evaluate | Execute JS | {"expression": "..."} |
Input.dispatchMouseEvent | Click | {"type": "click", ...} |
Input.dispatchKeyEvent | Type | {"type": "keyDown", ...} |
Page.navigate | Go to URL | {"url": "..."} |
Prerequisites
# Start Chrome with remote debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug
CLI Usage
# Use CDP mode explicitly
praisonai browser run "Search for AI" --engine cdp
# CDP-specific commands (no extension needed)
praisonai browser pages # List all tabs
praisonai browser dom <PAGE_ID> # Get DOM tree
praisonai browser content <PAGE_ID> # Read page text
praisonai browser js <PAGE_ID> "document.title" # Execute JS
praisonai browser console <PAGE_ID> # Capture logs
Programmatic Usage
from praisonai.browser.cdp_agent import CDPBrowserAgent
from praisonai.browser.cdp_utils import get_pages, execute_js, get_dom
# Direct CDP control
async def example():
# List pages
pages = await get_pages(port=9222)
for page in pages:
print(f"{page.id}: {page.title}")
# Execute JavaScript
result = await execute_js(pages[0].id, "document.title")
print(f"Title: {result}")
# Get DOM
dom = await get_dom(pages[0].id, depth=3)
# Read page content
from praisonai.browser.cdp_utils import read_page
content = await read_page(pages[0].id)
# Run full automation
async def automate():
agent = CDPBrowserAgent(
port=9222,
model="gpt-4o-mini",
headless=False,
)
result = await agent.run("Search for PraisonAI on Google")
print(result)
Sync Wrappers
from praisonai.browser.cdp_utils import (
get_pages_sync,
execute_js_sync,
get_dom_sync,
read_page_sync,
get_console_sync,
)
# Synchronous usage
pages = get_pages_sync(port=9222)
title = execute_js_sync(pages[0].id, "document.title")
Playwright Mode
How It Works
Backend APIs
| Playwright API | Purpose | CDP Equivalent |
|---|
page.goto(url) | Navigate | Page.navigate |
page.click(selector) | Click element | Input.dispatchMouseEvent |
page.fill(selector, text) | Type text | Input.dispatchKeyEvent |
page.screenshot() | Capture screenshot | Page.captureScreenshot |
page.content() | Get HTML | DOM.getDocument |
page.evaluate(fn) | Execute JS | Runtime.evaluate |
page.wait_for_selector() | Wait for element | Custom polling |
Prerequisites
# Install Playwright
pip install playwright
# Install browsers
playwright install chromium
playwright install firefox # Optional
playwright install webkit # Optional
CLI Usage
# Use Playwright mode
praisonai browser run "Search for AI" --engine playwright
# Headless mode
praisonai browser run "task" --engine playwright --headless
Programmatic Usage
from praisonai.browser.playwright_agent import PlaywrightBrowserAgent
async def example():
agent = PlaywrightBrowserAgent(
model="gpt-4o-mini",
browser_type="chromium", # or "firefox", "webkit"
headless=True,
)
result = await agent.run(
goal="Search for PraisonAI on Google",
start_url="https://google.com",
)
print(result)
Page Inspection Commands
These commands work via CDP (Chrome must be running with --remote-debugging-port=9222).
CLI Reference
| Command | Description | Example |
|---|
praisonai browser pages | List all browser tabs | Shows ID, title, URL |
praisonai browser dom <id> | Get DOM tree | --depth 4 |
praisonai browser content <id> | Read page as text | --limit 2000 |
praisonai browser console <id> | Capture console logs | --timeout 2 |
praisonai browser js <id> "code" | Execute JavaScript | Returns result |
Python API
from praisonai.browser.cdp_utils import (
get_pages, # async: List[PageInfo]
get_dom, # async: Dict (DOM tree)
read_page, # async: str (page text)
get_console, # async: List[Dict] (log entries)
execute_js, # async: Any (JS result)
wait_for_element, # async: bool
)
# Synchronous versions also available:
# get_pages_sync, get_dom_sync, read_page_sync, etc.
Agent Memory & Sessions
Session Isolation
Each browser session now uses Agent’s built-in session management:
from praisonai.browser import BrowserAgent
# Session is auto-generated or can be provided
agent = BrowserAgent(
model="gpt-4o-mini",
session_id="my-unique-session", # For memory isolation
)
# Reset for new session (clears chat_history)
agent.reset(new_session_id="new-session-id")
Memory Flow
Common Patterns
Sequential Tasks (CDP Mode)
import asyncio
from praisonai.browser.cdp_agent import CDPBrowserAgent
async def run_multiple_tasks():
agent = CDPBrowserAgent(port=9222, model="gpt-4o-mini")
tasks = [
"Search for AI on Google",
"Go to GitHub and search for praisonai",
"Navigate to docs.praison.ai",
]
for task in tasks:
result = await agent.run(task)
print(f"✅ {task}: {result['status']}")
await asyncio.sleep(2) # Pause between tasks
asyncio.run(run_multiple_tasks())
Headless Screenshot
from praisonai.browser.playwright_agent import PlaywrightBrowserAgent
async def take_screenshot():
agent = PlaywrightBrowserAgent(
headless=True,
browser_type="chromium",
)
result = await agent.run(
"Go to docs.praison.ai and take a screenshot",
start_url="https://docs.praison.ai",
)
Page Scraping
from praisonai.browser.cdp_utils import get_pages, read_page, execute_js
async def scrape_page():
pages = await get_pages()
# Find the right page
target = next(p for p in pages if "google" in p.url.lower())
# Get text content
content = await read_page(target.id)
# Or execute custom JS
links = await execute_js(
target.id,
"""
Array.from(document.querySelectorAll('a'))
.map(a => ({href: a.href, text: a.textContent.trim()}))
.slice(0, 10)
"""
)
return links
Troubleshooting
Debug CLI Commands
# Check server health
praisonai browser doctor
# List all sessions with step counts
praisonai browser sessions --limit 10
# View session history (step-by-step)
praisonai browser history <session_id>
# Reload extension after code changes
praisonai browser reload
# Run with debug mode
praisonai browser run "goal" --debug
Session Tracking
Both agent-side and server-side sessions are tracked in the same SQLite database:
# Database location
~/.praisonai/browser_sessions.db
# Tables:
# sessions: session_id, goal, status, started_at, current_url
# steps: session_id, step_number, observation, action, thought, created_at
Extension Mode Issues
| Issue | Cause | Solution |
|---|
| ”Another debugger attached” | Previous session not cleaned up | praisonai browser reload or restart Chrome |
| Session timeout (0 steps) | Extension not connected | Check praisonai browser doctor, reload extension |
| Actions not executing | CDP commands failing | Enable --debug mode, check console logs |
| Back-to-back sessions fail | Extension mode limitation | Use CDP mode: --engine cdp |
| Side panel not loading | Invalid Chrome version | Check minimum_chrome_version in manifest.json |
CDP Mode Issues
| Issue | Cause | Solution |
|---|
| Connection refused | Chrome not started with debug port | Start with --remote-debugging-port=9222 |
| Page not found | Invalid page ID | Run praisonai browser pages |
| Timeout | Page still loading | Use wait_for_element() |
| No pages returned | Chrome not running | Start Chrome with debug flag |
Playwright Mode Issues
| Issue | Cause | Solution |
|---|
| Browser not installed | Missing Playwright browsers | playwright install chromium |
| Selector not found | Wrong selector or slow page | Add delays or use wait_for_selector() |
Chrome Extension Console Logs
- Go to
chrome://extensions
- Find “PraisonAI Browser Agent”
- Click “Service worker” link to open DevTools
- Check Console for
[PraisonAI] and [Bridge] logs
Common Log Patterns
# Successful click:
[Bridge] Clicking element: #submit
# Failed click:
[Bridge] Click failed: All click methods failed for: #submit
# Session cleanup:
[PraisonAI] Cleaning up previous session (tab 12345)...
# Observation sent:
[PraisonAI] Step 0: https://google.com
Action Verification
All actions return { success, error }:
- success=true: CDP operation completed
- success=false: Error message indicates what failed
The error is sent in the next observation, allowing the LLM to retry or try alternative actions.