# Talk Using Voice

Talk to your employees out loud. Click the call button in any chat and have a real-time voice conversation, hands-free, no typing needed.

## TL;DR

Click the voice button in chat, speak naturally, and hear responses back. The full conversation is transcribed and saved in chat history just like text messages. No setup required.

## How It Works

1. **You speak.** Your microphone captures audio and streams it for real-time speech-to-text
2. **Employee processes.** The transcribed text goes through the same execution pipeline as a typed message (same skills, tools, memory, and reasoning)
3. **Employee speaks back.** The response is converted to speech sentence-by-sentence, so you start hearing the reply while the employee is still generating the rest

A live status indicator shows what is happening: listening, processing, or speaking.

## What You Can Do

| Action | How |
|--------|-----|
| **Start a voice call** | Click the **voice button** in the chat input area |
| **Talk naturally** | Speak as you would to a colleague. The employee listens and responds |
| **Switch to text mid-call** | Send a text message during a voice call. Context is shared |
| **Go back to typing** | Hang up and continue the conversation in text |
| **Use all employee capabilities** | Voice messages go through the same pipeline. The employee can use tools, search the web, generate images, and everything else |

## How to Set It Up

**Nothing to do.** Voice is available in every conversation by default. Click the voice button and allow microphone access when prompted.

Administrators can disable voice for the entire organization from **Settings, Technical, Communications**.

## Tips and Tricks

- **Speak clearly and pause.** The system uses silence detection to know when you are done talking
- **Be specific, just like text.** "Search for React 19 migration guides and summarize the top 3" works the same way by voice
- **Use voice for brainstorming.** Thinking out loud with an employee is faster than typing for exploratory tasks
- **Switch modes freely.** Start with voice, send a text, go back to voice. The employee keeps full context across both

## Behind the Scene

| | |
|---|---|
| **Speech-to-text** | Deepgram (nova-2 model, real-time WebSocket streaming) |
| **Text-to-speech** | Deepgram (aura-2 voices, sentence-by-sentence streaming) |
| **Voice options** | 5 female and 2 male English voices, plus Spanish variants |
| **Same pipeline** | Transcribed text goes through the exact same Dispatcher and execution pipeline as text chat |

## What It Costs

| | |
|---|---|
| **Voice credits** | ~3 credits per minute (rounded up) |
| **Token credits** | Standard token credits for the employee's thinking, same as text chat |
| **Example** | A typical 5-minute call costs around 25 credits total |

## Is It Safe

- **No audio storage.** Raw audio is streamed to the speech provider for transcription and immediately discarded. No audio recordings are saved
- **Transcripts only.** Only the text transcript is persisted in your conversation history, same as any text message
- **Microphone access.** Your browser asks for permission before enabling the microphone. You can revoke it anytime in browser settings

## Good to Know

- **Everything is saved as text.** Voice messages are transcribed and saved in chat history. When you come back later, the full conversation is there as text
- **Mixed mode.** Voice and text live in the same conversation. Nothing is lost when you switch between them
- **Minimum 1 second.** Calls shorter than 1 second are ignored to filter accidental clicks
- **Voice can be disabled.** Administrators can turn off voice for the organization from Settings, Technical, Communications

## Frequently Asked Questions

**Q: Can the employee hear background noise?**
A: The speech provider uses voice activity detection to filter noise, but a quiet environment gives the best results.

**Q: What languages does voice support?**
A: English is the primary language with multiple voice options. Spanish variants are also available. More languages are on the roadmap.

**Q: Is my voice recorded?**
A: No. Audio is streamed for real-time transcription and not stored. Only the text transcript is saved.

**Q: Can I use voice on mobile?**
A: Yes, if your mobile browser supports microphone access (most do). The experience works the same way.

**Q: Does the employee use different skills or tools during voice calls?**
A: No. Voice messages go through the exact same pipeline as text. The employee has access to all the same skills, tools, and memory.