Talk Using Voice

Talk to your employees out loud. Click the call button in any chat and have a real-time voice conversation, hands-free, no typing needed.

TL;DR

Click the voice button in chat, speak naturally, and hear responses back. The full conversation is transcribed and saved in chat history just like text messages. No setup required.

How It Works

You speak. Your microphone captures audio and streams it for real-time speech-to-text
Employee processes. The transcribed text goes through the same execution pipeline as a typed message (same skills, tools, memory, and reasoning)
Employee speaks back. The response is converted to speech sentence-by-sentence, so you start hearing the reply while the employee is still generating the rest

A live status indicator shows what is happening: listening, processing, or speaking.

What You Can Do

Action	How
Start a voice call	Click the voice button in the chat input area
Talk naturally	Speak as you would to a colleague. The employee listens and responds
Switch to text mid-call	Send a text message during a voice call. Context is shared
Go back to typing	Hang up and continue the conversation in text
Use all employee capabilities	Voice messages go through the same pipeline. The employee can use tools, search the web, generate images, and everything else

How to Set It Up

Nothing to do. Voice is available in every conversation by default. Click the voice button and allow microphone access when prompted.

Administrators can disable voice for the entire organization from Settings, Technical, Communications.

Tips and Tricks

Speak clearly and pause. The system uses silence detection to know when you are done talking
Be specific, just like text. "Search for React 19 migration guides and summarize the top 3" works the same way by voice
Use voice for brainstorming. Thinking out loud with an employee is faster than typing for exploratory tasks
Switch modes freely. Start with voice, send a text, go back to voice. The employee keeps full context across both

Behind the Scene


Speech-to-text	Deepgram (nova-2 model, real-time WebSocket streaming)
Text-to-speech	Deepgram (aura-2 voices, sentence-by-sentence streaming)
Voice options	5 female and 2 male English voices, plus Spanish variants
Same pipeline	Transcribed text goes through the exact same Dispatcher and execution pipeline as text chat

What It Costs


Voice credits	~3 credits per minute (rounded up)
Token credits	Standard token credits for the employee's thinking, same as text chat
Example	A typical 5-minute call costs around 25 credits total

Is It Safe

No audio storage. Raw audio is streamed to the speech provider for transcription and immediately discarded. No audio recordings are saved
Transcripts only. Only the text transcript is persisted in your conversation history, same as any text message
Microphone access. Your browser asks for permission before enabling the microphone. You can revoke it anytime in browser settings

Good to Know

Everything is saved as text. Voice messages are transcribed and saved in chat history. When you come back later, the full conversation is there as text
Mixed mode. Voice and text live in the same conversation. Nothing is lost when you switch between them
Minimum 1 second. Calls shorter than 1 second are ignored to filter accidental clicks
Voice can be disabled. Administrators can turn off voice for the organization from Settings, Technical, Communications

Frequently Asked Questions

Q: Can the employee hear background noise? A: The speech provider uses voice activity detection to filter noise, but a quiet environment gives the best results.

Q: What languages does voice support? A: English is the primary language with multiple voice options. Spanish variants are also available. More languages are on the roadmap.

Q: Is my voice recorded? A: No. Audio is streamed for real-time transcription and not stored. Only the text transcript is saved.

Q: Can I use voice on mobile? A: Yes, if your mobile browser supports microphone access (most do). The experience works the same way.

Q: Does the employee use different skills or tools during voice calls? A: No. Voice messages go through the exact same pipeline as text. The employee has access to all the same skills, tools, and memory.