Talk Using Voice
Talk to your employees out loud. Click the call button in any chat and have a real-time voice conversation, hands-free, no typing needed.
TL;DR
Click the voice button in chat, speak naturally, and hear responses back. The full conversation is transcribed and saved in chat history just like text messages. No setup required.
How It Works
- You speak. Your microphone captures audio and streams it for real-time speech-to-text
- Employee processes. The transcribed text goes through the same execution pipeline as a typed message (same skills, tools, memory, and reasoning)
- Employee speaks back. The response is converted to speech sentence-by-sentence, so you start hearing the reply while the employee is still generating the rest
A live status indicator shows what is happening: listening, processing, or speaking.
What You Can Do
| Action | How |
|---|---|
| Start a voice call | Click the voice button in the chat input area |
| Talk naturally | Speak as you would to a colleague. The employee listens and responds |
| Switch to text mid-call | Send a text message during a voice call. Context is shared |
| Go back to typing | Hang up and continue the conversation in text |
| Use all employee capabilities | Voice messages go through the same pipeline. The employee can use tools, search the web, generate images, and everything else |
How to Set It Up
Nothing to do. Voice is available in every conversation by default. Click the voice button and allow microphone access when prompted.
Administrators can disable voice for the entire organization from Settings, Technical, Communications.
Tips and Tricks
- Speak clearly and pause. The system uses silence detection to know when you are done talking
- Be specific, just like text. "Search for React 19 migration guides and summarize the top 3" works the same way by voice
- Use voice for brainstorming. Thinking out loud with an employee is faster than typing for exploratory tasks
- Switch modes freely. Start with voice, send a text, go back to voice. The employee keeps full context across both
Behind the Scene
| Speech-to-text | Deepgram (nova-2 model, real-time WebSocket streaming) |
| Text-to-speech | Deepgram (aura-2 voices, sentence-by-sentence streaming) |
| Voice options | 5 female and 2 male English voices, plus Spanish variants |
| Same pipeline | Transcribed text goes through the exact same Dispatcher and execution pipeline as text chat |
What It Costs
| Voice credits | ~3 credits per minute (rounded up) |
| Token credits | Standard token credits for the employee's thinking, same as text chat |
| Example | A typical 5-minute call costs around 25 credits total |
Is It Safe
- No audio storage. Raw audio is streamed to the speech provider for transcription and immediately discarded. No audio recordings are saved
- Transcripts only. Only the text transcript is persisted in your conversation history, same as any text message
- Microphone access. Your browser asks for permission before enabling the microphone. You can revoke it anytime in browser settings
Good to Know
- Everything is saved as text. Voice messages are transcribed and saved in chat history. When you come back later, the full conversation is there as text
- Mixed mode. Voice and text live in the same conversation. Nothing is lost when you switch between them
- Minimum 1 second. Calls shorter than 1 second are ignored to filter accidental clicks
- Voice can be disabled. Administrators can turn off voice for the organization from Settings, Technical, Communications
Frequently Asked Questions
Q: Can the employee hear background noise? A: The speech provider uses voice activity detection to filter noise, but a quiet environment gives the best results.
Q: What languages does voice support? A: English is the primary language with multiple voice options. Spanish variants are also available. More languages are on the roadmap.
Q: Is my voice recorded? A: No. Audio is streamed for real-time transcription and not stored. Only the text transcript is saved.
Q: Can I use voice on mobile? A: Yes, if your mobile browser supports microphone access (most do). The experience works the same way.
Q: Does the employee use different skills or tools during voice calls? A: No. Voice messages go through the exact same pipeline as text. The employee has access to all the same skills, tools, and memory.