> ## Documentation Index
> Fetch the complete documentation index at: https://docs.praison.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Browser Agent Module

> AI-powered browser automation using Chrome Extension and PraisonAI agents

# Browser Agent

Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.

## Quick Start

### 1. Start the Bridge Server

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser start --port 8765 --model gpt-4o
```

### 2. Load the Chrome Extension

1. Open `chrome://extensions`
2. Enable "Developer mode"
3. Load unpacked extension from `praisonai-chrome-extension/dist`

### 3. Use the Side Panel

Click the extension icon or press `Ctrl+Shift+P` to open the side panel. Enter your goal and the AI will control the browser.

## Architecture

**Flow:** Chrome Extension ↔ WebSocket ↔ Bridge Server ↔ PraisonAI Agent

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
graph TB
    A[Chrome Extension] --> B[Side Panel UI]
    B --> C[Bridge Client<br/>WebSocket]
    C --> D[Bridge Server<br/>FastAPI]
    D --> E[BrowserAgent<br/>PraisonAI]
    E --> F[LLM API<br/>GPT/Gemini]
    D --> G[SessionManager<br/>SQLite]
    
    H[Built-in AI<br/>Gemini Nano] --> B
    B --> H
```

The system consists of:

* **Chrome Extension**: Captures page state and executes actions via CDP
* **Bridge Server**: FastAPI WebSocket server that routes messages to agents
* **BrowserAgent**: PraisonAI agent that decides actions based on observations
* **SessionManager**: SQLite-based persistence for session history
* **Hybrid Mode**: Falls back to on-device Gemini Nano if server unavailable

### Session Flow

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
sequenceDiagram
    participant U as User
    participant E as Extension
    participant S as Server
    participant L as LLM
    
    U->>E: Enter goal
    E->>S: start_session(goal)
    S->>E: session_id
    loop Until done or max_steps
        E->>S: observation(page_state, elements, action_history)
        S->>L: Build prompt with goal + context
        L->>S: action JSON
        S->>E: action(click/type/etc)
        E->>E: Execute via CDP
        E->>E: Track success/error
    end
    E->>U: Task complete
```

### Session States

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
stateDiagram-v2
    [*] --> created : start_session
    created --> running : first observation
    running --> running : action/observation loop
    running --> completed : done=true
    running --> failed : max_steps or error
    running --> stopped : user cancel
    completed --> [*]
    failed --> [*]
    stopped --> [*]
```

## Smart Features

### Click Fallbacks

When clicks fail, the agent automatically tries:

1. **Viewport click** using `getBoundingClientRect()` + `scrollIntoView()`
2. **JavaScript click** via `element.click()`
3. **Focus + Enter** for buttons

```mermaid theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
flowchart LR
    A[Click Action] --> B{Viewport Click}
    B -->|Success| C[✅ Done]
    B -->|Fail| D{JS Click}
    D -->|Success| C
    D -->|Fail| E{Focus + Enter}
    E -->|Success| C
    E -->|Fail| F[❌ Report Error]
```

### Goal Context & Self-Correction

Every observation sent to the LLM includes:

* **Original goal**: Always visible to prevent drift
* **Action history**: Last 5 actions with success/failure status
* **Progress notes**: Summary of steps completed

### Failure Communication

When actions fail, the LLM receives explicit feedback:

```
⛔ LAST ACTION FAILED!
   Error: All click methods failed for: a.MV3Tnb
   → You MUST try a DIFFERENT approach!
```

This enables the agent to self-correct and find alternate paths.

## CLI Commands

### Run Browser Agent

Execute a goal directly from CLI with live progress display:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug  # Show all WebSocket messages
```

Options:

* `--url, -u`: Start URL (default: [https://www.google.com](https://www.google.com))
* `--model, -m`: LLM model (default: gpt-4o-mini)
* `--timeout, -t`: Timeout in seconds (default: 120)
* `--debug, -d`: Debug mode - show all events

**Example Output:**

```
🚀 Starting browser agent
   Goal: Go to google and search praisonai
   Model: gpt-4o-mini

Session: 4a703667

Step 0: ▶ TYPE → textarea#APjFqb
        📍 https://www.google.com/

Step 1: ▶ CLICK

Step 2: ▶ CLICK
        📍 https://www.google.com/search?q=praisonai

✅ Task completed!
```

### Launch Browser with Goal

Launch Chrome with the extension and optionally run a goal:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Just launch Chrome with extension
praisonai browser launch

# Launch and run goal
praisonai browser launch "Go to google and search AI"

# With specific engine
praisonai browser launch "Search for AI" --engine cdp
praisonai browser launch "Search for AI" --engine extension
```

Options:

* `--url, -u`: Start URL (default: [https://www.google.com](https://www.google.com))
* `--model, -m`: LLM model (default: gpt-4o-mini)
* `--max-steps`: Maximum steps (default: 20)
* `--engine`: Automation engine: extension, cdp, auto (default: auto)
* `--debug, -d`: Debug mode with detailed logging
* `--record-video`: Record video of browser session
* `--profile`: Enable performance profiling
* `--deep-profile`: Enable deep profiling with cProfile

### Performance Profiling

Track execution time per step to identify bottlenecks:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser launch "Go to google, search for AI" --profile
```

**Example Output:**

```
📊 Performance Profile
──────────────────────────────────────────────────────────────────────
Total Time: 16.4s | Steps: 3 | Avg: 5.5s/step

Step |    LLM | Screen | Action | Verify | Stable |  Total
──────────────────────────────────────────────────────────────────────
   0 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   5.1s
   1 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   1.5s
   2 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   3.6s
──────────────────────────────────────────────────────────────────────
Total |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |  16.4s

Bottlenecks: LLM 0% | Verify 0% | Stable 0%
```

For deep function-level profiling (cProfile):

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser launch "goal" --deep-profile
```

### Tab Management

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser tabs              # List all tabs
praisonai browser tabs --new https://google.com  # Open new tab
praisonai browser tabs --close TAB_ID    # Close tab
praisonai browser tabs --focus TAB_ID    # Focus tab
```

### Navigate

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID
```

### Execute JavaScript

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"
```

### Page Inspection (New)

Inspect browser pages without the extension:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# List all open pages
praisonai browser pages

# Get DOM tree
praisonai browser dom <PAGE_ID>

# Read page content as text
praisonai browser content <PAGE_ID>

# Capture console logs
praisonai browser console <PAGE_ID>

# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"
```

<Note>
  These commands work via CDP (Chrome DevTools Protocol) and require Chrome
  running with `--remote-debugging-port=9222`.
</Note>

### Automation Engines

Choose different execution engines with `--engine`:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension

# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp

# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright
```

| Engine     | Extension | Headless | Multi-Browser         |
| ---------- | --------- | -------- | --------------------- |
| extension  | Required  | No       | No                    |
| cdp        | No        | Yes      | Chrome only           |
| playwright | No        | Yes      | Chrome/Firefox/WebKit |

### Screenshot

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png
```

### Start Server

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser start [OPTIONS]
```

Options:

* `--port, -p`: Port to listen on (default: 8765)
* `--host, -H`: Host to bind to (default: 0.0.0.0)
* `--model, -m`: LLM model (default: gpt-4o-mini)
* `--max-steps`: Maximum steps per session (default: 20)
* `--verbose, -v`: Enable verbose logging

### List Sessions

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser sessions [OPTIONS]
```

Options:

* `--status, -s`: Filter by status (running, completed, failed)
* `--limit, -l`: Maximum sessions to show

### View History

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser history <SESSION_ID>
```

### Clear Sessions

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser clear --status completed --yes
```

### Reload Extension

Reload the Chrome extension after making changes:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser reload
praisonai browser reload --port 9222  # Custom Chrome debug port
```

### Health Diagnostics

Run health checks for the browser automation system:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser doctor          # Run all checks
praisonai browser doctor server   # Check bridge server
praisonai browser doctor chrome   # Check Chrome debugging
praisonai browser doctor extension  # Check extension loaded
praisonai browser doctor db       # Check session database
```

**Example Output:**

```
Browser Health Check

✅ Server: ok
   Connections: 1
   Sessions: 0

✅ Chrome: Chrome/131.0.6778.85
   WebSocket: ws://127.0.0.1:9222/devtools/browser/...

✅ Extension loaded
   URL: chrome-extension://fkmfdklcegbbpipbcimb...

✅ Session database
   Path: ~/.praisonai/browser_sessions.db
   Sessions: 42
   Steps: 387
```

## Python API

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonai.browser import BrowserServer, BrowserAgent

# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start()  # Blocks

# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
    "task": "Search for AI frameworks",
    "url": "https://google.com",
    "title": "Google",
    "elements": [{"selector": "#search", "tag": "input", "text": ""}]
})
```

## Session Management

```python theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
from praisonai.browser.sessions import SessionManager

manager = SessionManager()

# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])

# List sessions
sessions = manager.list_sessions(status="running")

# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
    print(f"Step {step['step_number']}: {step['action']}")
```

## Hybrid Mode (Extension)

The Chrome Extension supports hybrid mode:

1. **Bridge Mode**: Connect to PraisonAI server for cloud LLMs
2. **Built-in Mode**: Use Chrome's Gemini Nano on-device

If the bridge server is unavailable, it automatically falls back to built-in AI.

## Keyboard Shortcuts

| Shortcut       | Action             |
| -------------- | ------------------ |
| `Ctrl+Shift+P` | Toggle side panel  |
| `Alt+A`        | Start agent        |
| `Alt+S`        | Capture screenshot |

## Supported Actions

| Action        | Description                                       |
| ------------- | ------------------------------------------------- |
| `click`       | Click on element                                  |
| `type`        | Enter text                                        |
| `submit`      | Press Enter to submit forms                       |
| `scroll`      | Scroll page                                       |
| `navigate`    | Go to URL                                         |
| `clear_input` | Clear input field (fixes garbled/duplicated text) |
| `wait`        | Wait for page                                     |
| `screenshot`  | Capture screen                                    |
| `done`        | Task complete                                     |

## Error Detection & Recovery (v1.3+)

The agent automatically detects and recovers from errors:

### Detected Errors

* **Garbled/duplicated text** in input fields
* **Wrong page navigation** (user or browser interference)
* **Failed actions** (click not working, submit didn't fire)
* **Blocking elements** (popups, consent dialogs, login walls)

### Recovery Actions

When errors are detected, the agent will:

1. Set `error_detected: true` with description
2. Report `input_field_value` showing actual text visible
3. Use `clear_input` to fix garbled input
4. Use `navigate` to return to correct URL if off-track

### Step Timestamps

Debug mode now shows elapsed time for each step:

```bash theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
praisonai browser launch "goal" --debug
```

**Output:**

```
[+0.0s] Step 1: type → #APjFqb = "search term" (done=False)
   📝 Input field shows: "search term"
   📊 Progress: 50% [✓ on track]

[+2.3s] Step 2: submit → #APjFqb (done=False)
   📊 Progress: 75% [✓ on track]

[+4.1s] Step 3: done (done=True)
   📊 Progress: 100% [✓ on track]
```

### Performance Optimized

Action delays have been optimized for faster execution:

* Click: 200ms (was 500ms)
* Submit: 300ms (was 500ms)
* Search: 400ms (was 1000ms)

## WebSocket Protocol

Connect to `ws://localhost:8765/ws` and send/receive JSON messages:

```json theme={"theme":{"light":"vitesse-light","dark":"vitesse-dark"}}
// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}

// Send observation
{"type": "observation", "session_id": "...", "task": "...", 
 "url": "...", "elements": [...]}

// Receive action
{"type": "action", "action": "click", "selector": "#search", 
 "thought": "Clicking search button"}
```

## Environment Variables

| Variable            | Description                   |
| ------------------- | ----------------------------- |
| `OPENAI_API_KEY`    | OpenAI API key for GPT models |
| `ANTHROPIC_API_KEY` | Anthropic API key for Claude  |
| `GOOGLE_API_KEY`    | Google API key for Gemini     |
