Web access and browser automation

Overview

NanoClaw agents have full access to the web through the agent-browser tool. This enables:

Web search: Find information across the internet
Content extraction: Read articles, documentation, and web pages
Form automation: Fill out forms and submit data
Web scraping: Extract structured data from websites
Screenshot capture: Save visual snapshots of pages
Authentication: Save and reuse login sessions

The agent-browser skill is available to all agents automatically. No configuration required.

Quick start

Ask the agent to use the web naturally:

@Andy what are the top stories on Hacker News today?
@Andy search for recent Claude API updates
@Andy read this article and summarize it: https://example.com/article
@Andy extract all product prices from https://store.example.com

The agent will use agent-browser commands automatically.

Browser automation workflow

The typical workflow for browser tasks:

Navigate to page

agent-browser open https://example.com

Take snapshot

Get interactive elements with references:

agent-browser snapshot -i

Output:

textbox "Email" [ref=e1]
textbox "Password" [ref=e2]  
button "Sign In" [ref=e3]

Interact with elements

Use the refs from the snapshot:

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3

Wait and verify

agent-browser wait --url "**/dashboard"
agent-browser snapshot -i  # Verify we're logged in

Core commands

agent-browser open https://example.com

Page analysis

The snapshot command returns the page’s accessibility tree:

# Interactive elements only (recommended)
agent-browser snapshot -i

# Full accessibility tree
agent-browser snapshot

# Compact output
agent-browser snapshot -c

# Limit depth
agent-browser snapshot -d 3

# Scope to specific area
agent-browser snapshot -s "#main-content"

Example output:

heading "Welcome" [ref=e1]
textbox "Search" [ref=e2] placeholder="Type to search..."
button "Search" [ref=e3]
link "Documentation" [ref=e4] href="/docs"
link "API Reference" [ref=e5] href="/api"

Interactions

Use @ref from snapshots to interact:

Click
Type
Forms

agent-browser click @e1           # Click
agent-browser dblclick @e1        # Double-click  
agent-browser hover @e1           # Hover

agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key

agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox  
agent-browser select @e1 "value"  # Select dropdown option
agent-browser upload @e1 file.pdf # Upload file

Get information

agent-browser get text @e1        # Get element text
agent-browser get html @e1        # Get innerHTML
agent-browser get value @e1       # Get input value
agent-browser get attr @e1 href   # Get attribute
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser get count ".item"   # Count matching elements

Wait conditions

agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text
agent-browser wait --url "**/dashboard"    # Wait for URL pattern
agent-browser wait --load networkidle      # Wait for network idle

Screenshots and PDF

# Save screenshot
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full   # Full page

# Save as PDF
agent-browser pdf output.pdf

Advanced features

Semantic locators

Instead of refs, you can find elements semantically:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"

Useful when you want to interact without taking a snapshot first.

Saved authentication state

agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

Load saved state later

agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
# Already logged in!

State files are saved in the group folder and persist between sessions.

# Cookies
agent-browser cookies                     # Get all cookies
agent-browser cookies set name value      # Set cookie
agent-browser cookies clear               # Clear cookies

# LocalStorage
agent-browser storage local               # Get localStorage
agent-browser storage local set k v       # Set value

JavaScript execution

Run arbitrary JavaScript:

agent-browser eval "document.title"
agent-browser eval "localStorage.getItem('token')"
agent-browser eval "document.querySelectorAll('.item').length"

Example use cases

Daily news aggregation

@Andy every morning at 8am, check Hacker News and TechCrunch for 
AI-related articles and send me a summary

The agent will:

Open Hacker News
Extract top stories
Filter for AI topics
Open TechCrunch
Extract AI articles
Compile and send summary

Form submission

@Andy fill out the contact form at https://example.com/contact with 
my name, email, and this message: "Interested in a demo"

agent-browser open https://example.com/contact
agent-browser snapshot -i
# Output: textbox "Name" [ref=e1], textbox "Email" [ref=e2], 
#         textarea "Message" [ref=e3], button "Submit" [ref=e4]

agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Interested in a demo"
agent-browser click @e4
agent-browser wait --text "Thank you"

Data extraction

@Andy extract all product names and prices from https://store.example.com

agent-browser open https://store.example.com
agent-browser snapshot -i
# Find all product elements
agent-browser get count ".product"
# Extract data from each
agent-browser get text @e1  # Product name
agent-browser get text @e2  # Price
# ... repeat for all products

Monitoring and alerts

@Andy check https://status.example.com every hour and alert me if 
the status changes from "All Systems Operational"

Scheduled task:

agent-browser open https://status.example.com
agent-browser snapshot
agent-browser get text @e1  # Status element
# Compare to expected value and send alert if different

Research and screenshots

@Andy research the top 3 Y Combinator startups from this batch and 
take screenshots of their landing pages

agent-browser open https://ycombinator.com/companies
agent-browser snapshot -i
# Find top 3 companies
agent-browser click @e1  # First company
agent-browser screenshot startup-1.png
agent-browser back
# Repeat for others

Access from scheduled tasks

Browser automation works in scheduled tasks:

@Andy every Monday at 9am, check my GitHub notifications and summarize 
any new issues assigned to me

The scheduled task can:

Open GitHub
Load saved auth state
Navigate to notifications
Extract issue data
Send summary via IPC

Browser availability

The agent-browser skill is loaded automatically for all agents:

// From src/container-runner.ts — skills sync
const skillsSrc = path.join(process.cwd(), 'container', 'skills');
const skillsDst = path.join(groupSessionsDir, 'skills');
if (fs.existsSync(skillsSrc)) {
  for (const skillDir of fs.readdirSync(skillsSrc)) {
    const srcDir = path.join(skillsSrc, skillDir);
    if (!fs.statSync(srcDir).isDirectory()) continue;
    const dstDir = path.join(skillsDst, skillDir);
    fs.cpSync(srcDir, dstDir, { recursive: true });
  }
}

Operational skills from container/skills/ are synced to each group’s .claude/skills/ directory.

Browser persistence

Browser state (cookies, localStorage, auth) is saved per-group:

groups/
  main/
    browser-state.json
  family-chat/
    browser-state.json

Each group’s browser sessions are isolated from other groups.

Performance considerations

Headless mode

The browser runs in headless mode (no GUI) for efficiency:

# Configured in agent-browser implementation
--headless=new

Resource usage

Browser automation is resource-intensive:

Each browser instance uses ~100-200MB RAM
Multiple concurrent browsers multiply this
Consider reducing MAX_CONCURRENT_CONTAINERS if memory is limited

// From src/config.ts
export const MAX_CONCURRENT_CONTAINERS = 5;

Timeouts

Browser operations respect container timeout:

// From src/config.ts  
export const CONTAINER_TIMEOUT = 1800000; // 30 minutes

Long-running scraping tasks may hit this limit. Increase if needed:

export CONTAINER_TIMEOUT=3600000  # 1 hour

Debugging browser automation

Enable verbose logging

Ask the agent to show browser commands:

@Andy show me the exact agent-browser commands you're running

Save screenshots for debugging

agent-browser screenshot debug-step-1.png
# ... do something
agent-browser screenshot debug-step-2.png

Check browser state files

ls -la groups/main/*.json
cat groups/main/auth.json  # Inspect saved state

Limitations

Current limitations:

JavaScript-heavy sites: May have timing issues with dynamic content
CAPTCHAs: Cannot solve CAPTCHAs automatically
Rate limiting: Aggressive scraping may trigger rate limits
Complex interactions: Some advanced UI patterns may not work

Best practices

When to use browser automation

✅ Good use cases:

Extracting data from websites without APIs
Monitoring page changes
Automating form submissions
Taking screenshots for documentation
Testing web interfaces

❌ Avoid for:

Sites with good APIs (use the API instead)
Real-time data (too slow)
Sites with aggressive anti-bot protection
Critical workflows (too fragile)

Optimize for reliability

Use waits: Don’t assume instant page loads

agent-browser wait --load networkidle
agent-browser wait @e1  # Wait for element

Take snapshots frequently: Re-snapshot after navigation

agent-browser click @e1
agent-browser wait --load networkidle  
agent-browser snapshot -i  # Fresh refs

Save state for auth: Avoid re-login on every run
```
agent-browser state save auth.json
```
Handle errors gracefully: Check if elements exist before interacting
```
agent-browser get count ".error-message"
```

Scheduled tasks - Automate browser tasks on a schedule
Agent Swarms - Use multiple agents for parallel scraping
Customization - Adjust container timeouts and limits

Skill documentation

Full agent-browser skill documentation: container/skills/agent-browser/SKILL.md

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

Documentation Index

​Overview

​Quick start

​Browser automation workflow

​Core commands

​Navigation

​Page analysis

​Interactions

​Get information

​Wait conditions

​Screenshots and PDF

​Advanced features

​Semantic locators

​Saved authentication state

​Cookie and storage management

​JavaScript execution

​Example use cases

​Daily news aggregation

​Form submission

​Data extraction

​Monitoring and alerts

​Research and screenshots

​Access from scheduled tasks

​Browser availability

​Browser persistence

​Performance considerations

​Headless mode

​Resource usage

​Timeouts

​Debugging browser automation

​Enable verbose logging

​Save screenshots for debugging

​Check browser state files

​Limitations

​Best practices

​When to use browser automation

​Optimize for reliability

​Related documentation

​Skill documentation

Overview

Quick start

Browser automation workflow

Core commands

Navigation

Page analysis

Interactions

Get information

Wait conditions

Screenshots and PDF

Advanced features

Semantic locators

Saved authentication state

Cookie and storage management

JavaScript execution

Example use cases

Daily news aggregation

Form submission

Data extraction

Monitoring and alerts

Research and screenshots

Access from scheduled tasks

Browser availability

Browser persistence

Performance considerations

Headless mode

Resource usage

Timeouts

Debugging browser automation

Enable verbose logging

Save screenshots for debugging

Check browser state files

Limitations

Best practices

When to use browser automation

Optimize for reliability

Related documentation

Skill documentation