# Talk Using Voice Talk to your employees out loud. Click the call button in any chat and have a real-time voice conversation, hands-free, no typing needed. ## TL;DR Click the voice button in chat, speak naturally, and hear responses back. The full conversation is transcribed and saved in chat history just like text messages. No setup required. ## How It Works 1. **You speak.** Your microphone captures audio and streams it for real-time speech-to-text 2. **Employee processes.** The transcribed text goes through the same execution pipeline as a typed message (same skills, tools, memory, and reasoning) 3. **Employee speaks back.** The response is converted to speech sentence-by-sentence, so you start hearing the reply while the employee is still generating the rest A live status indicator shows what is happening: listening, processing, or speaking. ## What You Can Do | Action | How | |--------|-----| | **Start a voice call** | Click the **voice button** in the chat input area | | **Talk naturally** | Speak as you would to a colleague. The employee listens and responds | | **Switch to text mid-call** | Send a text message during a voice call. Context is shared | | **Go back to typing** | Hang up and continue the conversation in text | | **Use all employee capabilities** | Voice messages go through the same pipeline. The employee can use tools, search the web, generate images, and everything else | ## How to Set It Up **Nothing to do.** Voice is available in every conversation by default. Click the voice button and allow microphone access when prompted. Administrators can disable voice for the entire organization from **Settings, Technical, Communications**. ## Tips and Tricks - **Speak clearly and pause.** The system uses silence detection to know when you are done talking - **Be specific, just like text.** "Search for React 19 migration guides and summarize the top 3" works the same way by voice - **Use voice for brainstorming.** Thinking out loud with an employee is faster than typing for exploratory tasks - **Switch modes freely.** Start with voice, send a text, go back to voice. The employee keeps full context across both ## Behind the Scene | | | |---|---| | **Speech-to-text** | Deepgram (nova-2 model, real-time WebSocket streaming) | | **Text-to-speech** | Deepgram (aura-2 voices, sentence-by-sentence streaming) | | **Voice options** | 5 female and 2 male English voices, plus Spanish variants | | **Same pipeline** | Transcribed text goes through the exact same Dispatcher and execution pipeline as text chat | ## What It Costs | | | |---|---| | **Voice credits** | ~3 credits per minute (rounded up) | | **Token credits** | Standard token credits for the employee's thinking, same as text chat | | **Example** | A typical 5-minute call costs around 25 credits total | ## Is It Safe - **No audio storage.** Raw audio is streamed to the speech provider for transcription and immediately discarded. No audio recordings are saved - **Transcripts only.** Only the text transcript is persisted in your conversation history, same as any text message - **Microphone access.** Your browser asks for permission before enabling the microphone. You can revoke it anytime in browser settings ## Good to Know - **Everything is saved as text.** Voice messages are transcribed and saved in chat history. When you come back later, the full conversation is there as text - **Mixed mode.** Voice and text live in the same conversation. Nothing is lost when you switch between them - **Minimum 1 second.** Calls shorter than 1 second are ignored to filter accidental clicks - **Voice can be disabled.** Administrators can turn off voice for the organization from Settings, Technical, Communications ## Frequently Asked Questions **Q: Can the employee hear background noise?** A: The speech provider uses voice activity detection to filter noise, but a quiet environment gives the best results. **Q: What languages does voice support?** A: English is the primary language with multiple voice options. Spanish variants are also available. More languages are on the roadmap. **Q: Is my voice recorded?** A: No. Audio is streamed for real-time transcription and not stored. Only the text transcript is saved. **Q: Can I use voice on mobile?** A: Yes, if your mobile browser supports microphone access (most do). The experience works the same way. **Q: Does the employee use different skills or tools during voice calls?** A: No. Voice messages go through the exact same pipeline as text. The employee has access to all the same skills, tools, and memory.