Skip to main content

Browser Agent Deep Dive

This guide explains how PraisonAI browser automation works under the hood, covering all execution modes, APIs, and integration patterns.

Architecture Overview


Instruction Flow (Extension Mode)

This diagram shows exactly how your instruction flows through the system:

Key Points

  1. User → CLI: User provides goal via praisonai browser run "goal"
  2. CLI → Server: CLI connects over WebSocket, sends start_session
  3. Server → Extension: Server forwards start_automation to Chrome extension
  4. Extension → Page: Extension uses CDP to capture page state (DOM, screenshot)
  5. Extension → Server: Sends observation with page details
  6. Server → Agent: Passes observation to BrowserAgent
  7. Agent → LLM: Agent builds prompt with goal + context, gets action from LLM
  8. Server → Extension: Forwards action (click, type, scroll, etc.)
  9. Extension → Page: Executes action via CDP
  10. Loop: Repeats until agent returns done: true or timeout

Execution Modes

Comparison Table

FeatureExtension ModeCDP ModePlaywright Mode
Requires Extension✅ Yes❌ No❌ No
Headless Support❌ No✅ Yes✅ Yes
Multi-BrowserChrome onlyChrome onlyChrome, Firefox, WebKit
Session Limit1 at a timeUnlimitedUnlimited
API Usedchrome.debuggerCDP WebSocketPlaywright API
Best ForInteractive useHeadless automationCross-browser testing
Extra DependenciesNoneaiohttp, websocketsplaywright

Extension Mode (Default)

How It Works

Backend APIs

APILocationPurpose
chrome.debugger.attach()ExtensionAttach to tab for control
chrome.debugger.sendCommand()ExtensionExecute CDP commands
Runtime.evaluateCDPExecute JavaScript
Input.dispatchMouseEventCDPSimulate clicks
Input.dispatchKeyEventCDPSimulate typing
Page.captureScreenshotCDPTake screenshots

CLI Usage

# Basic usage (extension mode is default)
praisonai browser run "Search for PraisonAI on Google"

# With options
praisonai browser run "task" \
    --url https://google.com \
    --model gpt-4o \
    --timeout 120 \
    --debug

Programmatic Usage

from praisonai.browser import BrowserServer, BrowserAgent

# Start server
server = BrowserServer(port=8765, model="gpt-4o-mini")
server.start_background()  # Runs in thread

# Create agent (requires extension to be connected)
agent = BrowserAgent(model="gpt-4o-mini", session_id="my-session")

# Agent is called via WebSocket, not directly
# Extension sends observations → Server calls agent → Extension executes actions

Limitations

Extension mode supports only one session at a time because Chrome’s chrome.debugger API only allows one debugger attachment per tab.

CDP Mode

How It Works

Backend APIs

APIPurposeExample
GET /jsonList all pageshttp://localhost:9222/json
DOM.getDocumentGet DOM tree{"depth": 4}
DOM.querySelectorFind element{"selector": "#search"}
Runtime.evaluateExecute JS{"expression": "..."}
Input.dispatchMouseEventClick{"type": "click", ...}
Input.dispatchKeyEventType{"type": "keyDown", ...}
Page.navigateGo to URL{"url": "..."}

Prerequisites

# Start Chrome with remote debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
    --remote-debugging-port=9222 \
    --user-data-dir=/tmp/chrome-debug

CLI Usage

# Use CDP mode explicitly
praisonai browser run "Search for AI" --engine cdp

# CDP-specific commands (no extension needed)
praisonai browser pages              # List all tabs
praisonai browser dom <PAGE_ID>      # Get DOM tree
praisonai browser content <PAGE_ID>  # Read page text
praisonai browser js <PAGE_ID> "document.title"  # Execute JS
praisonai browser console <PAGE_ID>  # Capture logs

Programmatic Usage

from praisonai.browser.cdp_agent import CDPBrowserAgent
from praisonai.browser.cdp_utils import get_pages, execute_js, get_dom

# Direct CDP control
async def example():
    # List pages
    pages = await get_pages(port=9222)
    for page in pages:
        print(f"{page.id}: {page.title}")
    
    # Execute JavaScript
    result = await execute_js(pages[0].id, "document.title")
    print(f"Title: {result}")
    
    # Get DOM
    dom = await get_dom(pages[0].id, depth=3)
    
    # Read page content
    from praisonai.browser.cdp_utils import read_page
    content = await read_page(pages[0].id)

# Run full automation
async def automate():
    agent = CDPBrowserAgent(
        port=9222,
        model="gpt-4o-mini",
        headless=False,
    )
    result = await agent.run("Search for PraisonAI on Google")
    print(result)

Sync Wrappers

from praisonai.browser.cdp_utils import (
    get_pages_sync,
    execute_js_sync,
    get_dom_sync,
    read_page_sync,
    get_console_sync,
)

# Synchronous usage
pages = get_pages_sync(port=9222)
title = execute_js_sync(pages[0].id, "document.title")

Playwright Mode

How It Works

Backend APIs

Playwright APIPurposeCDP Equivalent
page.goto(url)NavigatePage.navigate
page.click(selector)Click elementInput.dispatchMouseEvent
page.fill(selector, text)Type textInput.dispatchKeyEvent
page.screenshot()Capture screenshotPage.captureScreenshot
page.content()Get HTMLDOM.getDocument
page.evaluate(fn)Execute JSRuntime.evaluate
page.wait_for_selector()Wait for elementCustom polling

Prerequisites

# Install Playwright
pip install playwright

# Install browsers
playwright install chromium
playwright install firefox  # Optional
playwright install webkit   # Optional

CLI Usage

# Use Playwright mode
praisonai browser run "Search for AI" --engine playwright

# Headless mode
praisonai browser run "task" --engine playwright --headless

Programmatic Usage

from praisonai.browser.playwright_agent import PlaywrightBrowserAgent

async def example():
    agent = PlaywrightBrowserAgent(
        model="gpt-4o-mini",
        browser_type="chromium",  # or "firefox", "webkit"
        headless=True,
    )
    
    result = await agent.run(
        goal="Search for PraisonAI on Google",
        start_url="https://google.com",
    )
    print(result)

Page Inspection Commands

These commands work via CDP (Chrome must be running with --remote-debugging-port=9222).

CLI Reference

CommandDescriptionExample
praisonai browser pagesList all browser tabsShows ID, title, URL
praisonai browser dom <id>Get DOM tree--depth 4
praisonai browser content <id>Read page as text--limit 2000
praisonai browser console <id>Capture console logs--timeout 2
praisonai browser js <id> "code"Execute JavaScriptReturns result

Python API

from praisonai.browser.cdp_utils import (
    get_pages,      # async: List[PageInfo]
    get_dom,        # async: Dict (DOM tree)
    read_page,      # async: str (page text)
    get_console,    # async: List[Dict] (log entries)
    execute_js,     # async: Any (JS result)
    wait_for_element,  # async: bool
)

# Synchronous versions also available:
# get_pages_sync, get_dom_sync, read_page_sync, etc.

Agent Memory & Sessions

Session Isolation

Each browser session now uses Agent’s built-in session management:
from praisonai.browser import BrowserAgent

# Session is auto-generated or can be provided
agent = BrowserAgent(
    model="gpt-4o-mini",
    session_id="my-unique-session",  # For memory isolation
)

# Reset for new session (clears chat_history)
agent.reset(new_session_id="new-session-id")

Memory Flow


Common Patterns

Sequential Tasks (CDP Mode)

import asyncio
from praisonai.browser.cdp_agent import CDPBrowserAgent

async def run_multiple_tasks():
    agent = CDPBrowserAgent(port=9222, model="gpt-4o-mini")
    
    tasks = [
        "Search for AI on Google",
        "Go to GitHub and search for praisonai",
        "Navigate to docs.praison.ai",
    ]
    
    for task in tasks:
        result = await agent.run(task)
        print(f"✅ {task}: {result['status']}")
        await asyncio.sleep(2)  # Pause between tasks

asyncio.run(run_multiple_tasks())

Headless Screenshot

from praisonai.browser.playwright_agent import PlaywrightBrowserAgent

async def take_screenshot():
    agent = PlaywrightBrowserAgent(
        headless=True,
        browser_type="chromium",
    )
    
    result = await agent.run(
        "Go to docs.praison.ai and take a screenshot",
        start_url="https://docs.praison.ai",
    )

Page Scraping

from praisonai.browser.cdp_utils import get_pages, read_page, execute_js

async def scrape_page():
    pages = await get_pages()
    
    # Find the right page
    target = next(p for p in pages if "google" in p.url.lower())
    
    # Get text content
    content = await read_page(target.id)
    
    # Or execute custom JS
    links = await execute_js(
        target.id,
        """
        Array.from(document.querySelectorAll('a'))
            .map(a => ({href: a.href, text: a.textContent.trim()}))
            .slice(0, 10)
        """
    )
    return links

Troubleshooting

Debug CLI Commands

# Check server health
praisonai browser doctor

# List all sessions with step counts
praisonai browser sessions --limit 10

# View session history (step-by-step)
praisonai browser history <session_id>

# Reload extension after code changes
praisonai browser reload

# Run with debug mode
praisonai browser run "goal" --debug

Session Tracking

Both agent-side and server-side sessions are tracked in the same SQLite database:
# Database location
~/.praisonai/browser_sessions.db

# Tables:
#   sessions: session_id, goal, status, started_at, current_url
#   steps: session_id, step_number, observation, action, thought, created_at

Extension Mode Issues

IssueCauseSolution
”Another debugger attached”Previous session not cleaned uppraisonai browser reload or restart Chrome
Session timeout (0 steps)Extension not connectedCheck praisonai browser doctor, reload extension
Actions not executingCDP commands failingEnable --debug mode, check console logs
Back-to-back sessions failExtension mode limitationUse CDP mode: --engine cdp
Side panel not loadingInvalid Chrome versionCheck minimum_chrome_version in manifest.json

CDP Mode Issues

IssueCauseSolution
Connection refusedChrome not started with debug portStart with --remote-debugging-port=9222
Page not foundInvalid page IDRun praisonai browser pages
TimeoutPage still loadingUse wait_for_element()
No pages returnedChrome not runningStart Chrome with debug flag

Playwright Mode Issues

IssueCauseSolution
Browser not installedMissing Playwright browsersplaywright install chromium
Selector not foundWrong selector or slow pageAdd delays or use wait_for_selector()

Chrome Extension Console Logs

  1. Go to chrome://extensions
  2. Find “PraisonAI Browser Agent”
  3. Click “Service worker” link to open DevTools
  4. Check Console for [PraisonAI] and [Bridge] logs

Common Log Patterns

# Successful click:
[Bridge] Clicking element: #submit

# Failed click:
[Bridge] Click failed: All click methods failed for: #submit

# Session cleanup:
[PraisonAI] Cleaning up previous session (tab 12345)...

# Observation sent:
[PraisonAI] Step 0: https://google.com

Action Verification

All actions return { success, error }:
  • success=true: CDP operation completed
  • success=false: Error message indicates what failed
The error is sent in the next observation, allowing the LLM to retry or try alternative actions.