Sistava

Talk Using Voice

Talk to your employees out loud. Click the call button in any chat and have a real-time voice conversation, hands-free, no typing needed.

TL;DR

Click the voice button in chat, speak naturally, and hear responses back. The full conversation is transcribed and saved in chat history just like text messages. No setup required.

How It Works

  1. You speak. Your microphone captures audio and streams it for real-time speech-to-text
  2. Employee processes. The transcribed text goes through the same execution pipeline as a typed message (same skills, tools, memory, and reasoning)
  3. Employee speaks back. The response is converted to speech sentence-by-sentence, so you start hearing the reply while the employee is still generating the rest

A live status indicator shows what is happening: listening, processing, or speaking.

What You Can Do

Action How
Start a voice call Click the voice button in the chat input area
Talk naturally Speak as you would to a colleague. The employee listens and responds
Switch to text mid-call Send a text message during a voice call. Context is shared
Go back to typing Hang up and continue the conversation in text
Use all employee capabilities Voice messages go through the same pipeline. The employee can use tools, search the web, generate images, and everything else

How to Set It Up

Nothing to do. Voice is available in every conversation by default. Click the voice button and allow microphone access when prompted.

Administrators can disable voice for the entire organization from Settings, Technical, Communications.

Tips and Tricks

Behind the Scene

Speech-to-text Deepgram (nova-2 model, real-time WebSocket streaming)
Text-to-speech Deepgram (aura-2 voices, sentence-by-sentence streaming)
Voice options 5 female and 2 male English voices, plus Spanish variants
Same pipeline Transcribed text goes through the exact same Dispatcher and execution pipeline as text chat

What It Costs

Voice credits ~3 credits per minute (rounded up)
Token credits Standard token credits for the employee's thinking, same as text chat
Example A typical 5-minute call costs around 25 credits total

Is It Safe

Good to Know

Frequently Asked Questions

Q: Can the employee hear background noise? A: The speech provider uses voice activity detection to filter noise, but a quiet environment gives the best results.

Q: What languages does voice support? A: English is the primary language with multiple voice options. Spanish variants are also available. More languages are on the roadmap.

Q: Is my voice recorded? A: No. Audio is streamed for real-time transcription and not stored. Only the text transcript is saved.

Q: Can I use voice on mobile? A: Yes, if your mobile browser supports microphone access (most do). The experience works the same way.

Q: Does the employee use different skills or tools during voice calls? A: No. Voice messages go through the exact same pipeline as text. The employee has access to all the same skills, tools, and memory.