# Control Your Browser

Your employees can navigate websites, fill forms, click buttons, and extract data using a real browser on your computer.

## TL;DR

Employees drive a real Chromium-based browser on your desktop — navigate, click, fill forms, take screenshots. They use **the same browser you use** (the one set as your system default) so the engine, extensions, and behavior match. The browser runs with its own dedicated sign-in profile, so the agent's logins live separately from your live browser windows. Requires the Desktop Companion app (the same app that powers [Computer Control](/en/guide/capabilities/computer)).

**One app, two capabilities.** Download the Desktop Companion once and your employees get both Browser Control and Computer Control. You can enable or disable each one independently per employee.

## How It Works

You give the employee a task that requires browser interaction. The employee sends commands to the Desktop Companion app running on your computer, which controls a real browser. Results (screenshots and extracted data) come back to the chat.

| Step | What Happens |
|------|-------------|
| **1. You instruct** | "Go to LinkedIn and download my connections list" |
| **2. Employee plans** | Breaks the task into browser steps |
| **3. Commands sent** | Instructions sent to the Desktop Companion app on your machine |
| **4. Browser acts** | The app executes: navigate, click, type, scroll |
| **5. Results return** | Screenshots and extracted data shown in chat |

## What It Can Do

| Capability | Example |
|-----------|---------|
| **Navigate websites** | "Go to our analytics dashboard and take a screenshot" |
| **Fill out forms** | "Fill in the contact form on this page with our company details" |
| **Extract data** | "Go to this competitor's pricing page and pull their plan details" |
| **Search Google** | "Search Google for 'best CRM tools 2026' and summarize the top 5 results" |
| **Monitor pages** | "Check if the deploy status page shows green" |
| **Interact with apps** | "Open our Jira board and create a new ticket for this bug" |

## How to Set It Up

Browser Control is one of two capabilities that come with the Desktop Companion app. Download it once and you get both Browser Control and [Computer Control](/en/guide/capabilities/computer).

1. **Download** the Desktop Companion app from the employee's **Tools** tab
2. **Install and run** it on your computer
3. **Grant permissions** when prompted. The app shows two switches: **Screen Recording** and **Keyboard & Mouse**. Click the Enable button next to each. macOS opens System Settings so you can flip them on
4. **Quit and reopen the app once** so the freshly granted permissions take effect (macOS caches the previous denial until a full relaunch)
5. **Click Connect.** A browser tab opens, pairs your account, and closes. The app status turns green
6. **Done.** The Tools tab in the webapp shows Ready

Already have the companion app installed for Computer Control? You're all set. Both capabilities share the same connection.

### First Task, First Time

The first time you ask an employee to do something on your machine, they will check if the companion is ready. If it is not, they will point you back here, share a download link, and wait. Once the app shows Connected and the tool row says Ready, ask again and the work starts immediately. There is no need to repeat the instruction in different words, the employee is simply waiting on the app.

Once connected, you can manage the tool per employee:

1. Select the employee
2. Click the **Tools** tab
3. Find "Browser Controller" in the Actions section

| Action | What it does | How |
|--------|-------------|-----|
| **Enable / Disable** | Controls whether the employee can use this tool | Toggle the switch |
| **Tool Rules** | Custom instructions that guide how the employee uses this specific tool, e.g. "always take a screenshot after each action" or "never navigate away from our domain" | Expand the tool, then write your rules in the text field |
| **Delete** | Permanently removes the tool from the employee | Click the delete button |

## Tips & Tricks

- **Be specific about the task.** "Go to Google Analytics, navigate to the Audience Overview report for the last 7 days, and screenshot it" beats "check analytics"
- **Sign in once inside the agent's browser.** The first time the employee needs a site that requires login, sign in from inside their browser window. The sign-in sticks across future runs because the agent keeps its own persistent profile
- **Ask for a specific browser if you need it.** Say "use Chrome for this" or "open this in Edge" and the employee switches for that task. Leave it out and they use whatever your system default is
- **Break complex flows into steps.** For multi-page workflows, describe each step clearly
- **Ask for screenshots.** The employee takes screenshots automatically, but you can request them at specific points for verification

## Which Browsers Work

The employee drives the browser you use every day, as long as it's Chromium-based. Supported out of the box on macOS, Windows, and Linux:

| Browser | Supported |
|---|---|
| **Brave** | Yes |
| **Google Chrome** | Yes |
| **Microsoft Edge** | Yes |
| **Arc** | Yes |
| **Vivaldi** | Yes |
| **Opera** | Yes |
| **Chromium** (open-source) | Yes |
| **Safari** | Not supported — different engine |
| **Firefox** | Not supported — different engine |

If your system default browser is one of the supported ones, the employee picks it automatically. If your default is Safari or Firefox, the employee falls back to the first Chromium browser it finds on your computer, so the experience still works — just not in your favorite browser. If nothing Chromium is installed at all, a bundled Chromium is used as last resort.

**Asking for a specific one:** say "use Chrome" or "use Edge for this" in chat and the employee switches for that task only. Your default stays put. If the named browser isn't installed, the employee quietly falls back to your default rather than erroring out.

## Behind the Scene

| | |
|---|---|
| **Powered by** | Playwright driving a real Chromium browser (via Desktop Companion app) |
| **How it works** | Commands are sent via WebSocket to the companion app, which controls the browser on your machine |
| **Auth** | The employee has a dedicated sign-in profile. Sign in once inside the agent's browser and it stays signed in. Your live browser stays untouched |
| **Vision-based** | The employee sees the page via screenshots and reasons about what to click/type |
| **Platforms** | macOS, Windows, Linux |

### Browser Controller vs Web Scraper vs Web Search

| | **Browser Controller** | **Web Scraper** | **Web Search** |
|---|---|---|---|
| **Purpose** | Interact with web pages (click, type, navigate) | Read full content from a URL | Find information via search engine |
| **Input** | Task instructions | A URL | A question or topic |
| **Output** | Screenshots, extracted data, completed actions | Clean page text | Titles, snippets, and source links |
| **Auth content** | Yes, once you sign in inside the agent's browser | No, public pages only | No, public results only |
| **JavaScript** | Full support, real browser | Limited fallback | N/A |
| **Best for** | "Do this thing on that website" | "Read this page for me" | "Find info about X" |
| **Requires** | Desktop Companion app | Nothing | Nothing |

The employee picks the right tool automatically. For simple reads, they use Web Scraper. For searches, Web Search. Browser Controller is reserved for when real interaction is needed.

## What It Costs

| | |
|---|---|
| **Cost** | Runtime credits based on processing time |

## Is It Safe

- **No password entry.** The employee will never type passwords or interact with security dialogs. Sign in yourself the first time — either in the agent's browser window when it opens a login page, or by asking them to open a site so you can sign in before they start real work
- **Separate sign-in profile.** The agent's browser keeps its own cookies and logged-in sessions, completely isolated from your regular browsing. Anything you've signed into in your everyday browser is NOT automatically available to the employee. If you want the employee to access a site, sign into it once inside the agent's browser — the session sticks from then on

## Good to Know

- **Desktop Companion required.** Browser Control only works when the companion app is installed and running on your computer. Without it, the tool is unavailable. The same app also provides [Computer Control](/en/guide/capabilities/computer)
- **Screenshots saved to Drive.** Every screenshot taken during browser control is persisted to the employee's Drive tab
- **Complex layouts.** The employee sees the page via screenshots and reasons about what to click/type. Complex layouts may occasionally need clarification

## Frequently Asked Questions

**Q: Does Browser Controller work on any website?**
A: Yes. It controls a real browser, so it works on any site that loads in a standard browser, including JavaScript-heavy SPAs, dashboards, and authenticated apps.

**Q: Can the employee log into sites for me?**
A: No. The employee will never type passwords or handle login flows. The first time the employee opens a site that needs a login, its browser shows you the login page — sign in once, and the session persists inside the agent's browser profile from then on.

**Q: Does the employee have access to all the sites I'm already logged into?**
A: No. The agent's browser runs with its own dedicated profile, completely separate from your everyday browsing. This is intentional: it prevents the agent from clashing with your open browser windows and keeps your personal session isolated from the agent's. Sign in once, inside the agent's browser, per site you want it to use.

**Q: Which browser does the employee use?**
A: Your system default, if it's Chromium-based — Brave, Chrome, Edge, Arc, Vivaldi, or Opera. You can also ask for a specific one per task ("use Chrome for this") and it switches just for that call. If your default is Safari or Firefox, the employee uses the first Chromium browser it finds installed on your computer instead.

**Q: What if I only have Safari or Firefox?**
A: The employee falls back to a bundled Chromium browser that ships inside the companion app. It still works, just not in your preferred browser. Installing Chrome, Brave, or Edge will cause the employee to start using that instead next time it runs.

**Q: Is Browser Controller available on mobile?**
A: No. It requires the Desktop Companion app running on macOS, Windows, or Linux.

**Q: What happens if the companion app disconnects?**
A: The tool becomes unavailable. The employee will let you know and suggest alternatives (Web Search or Web Scraper for simpler tasks). Reconnect the companion app to resume.

**Q: I granted permissions but the employee still says it can't see my screen. Why?**
A: macOS caches the previous denial. Quit the companion app fully and reopen it. The first action after reopening picks up the new permission state. This is a one-time thing, after that the permission sticks across restarts.

**Q: What permissions does the app actually need?**
A: Only two, and only if the employee needs to use your browser or computer: **Screen Recording** so they can see what is on screen, and **Keyboard & Mouse** (Accessibility on macOS) so they can click and type. Both are grantable from the app's Permissions section with one click each. Nothing else is requested.

**Q: Can multiple employees use Browser Controller at the same time?**
A: They share the same companion app connection, so browser tasks are processed sequentially. For parallel browser work, the employee queues requests.