How-to

Voice

Voice flips an Agent from text-only to spoken. The microphone captures audio, STT transcribes it, the Agent reasons in text, TTS reads the response back. Both providers are configurable.

What you'll learn

How the voice path flows end to end
How to enable voice on an Agent
Which STT and TTS providers are supported
When voice is the right mode and when it is not

The voice path

Microphone input is captured in the client, streamed to the configured STT provider, transcribed to text, passed to the Agent like any other message, and the response is sent through the configured TTS provider for playback. Every leg is traced.

Enable voice on an Agent

1
Open step 7 of the builder
Voice Configuration is off by default. Toggle it on.
2
Pick an STT provider
Choose from the providers registered in Workspace Settings — for example OpenAI Whisper, Google Speech-to-Text, AWS Transcribe, or a self-hosted STT.
3
Pick a TTS provider and voice
Choose a TTS provider and a voice. Most providers ship multiple voices; preview before saving.
4
Save and Publish
Publish the Agent. The test view now shows a microphone icon and reads responses aloud.

When voice is the right mode

Voice fits inbound call deflection, field-ops Agents on mobile, accessibility-first interfaces and hands-free support flows. It is not the right mode for code generation, structured data entry, or anything that needs to be skim-read.

Frequently asked questions

Can I use different STT and TTS providers?: Yes. STT and TTS are configured independently. You might pick Whisper for transcription and ElevenLabs for playback.
Is voice supported in the web widget?: Yes, the widget supports microphone capture and audio playback when the Agent has voice enabled. Browser permission for the microphone is required.
Does voice cost more per Run?: There is added STT and TTS spend on top of the LLM and Tool cost. Each leg is itemized in the Run Trace and rolls up in Analytics.
Can I run voice on-premise?: Yes. Point STT and TTS at self-hosted providers or your private cloud endpoints. The voice pipeline stays inside your infrastructure.

Create an Agent

Test an Agent

Analytics overview

See voice cost broken out per Agent.

The voice path

Enable voice on an Agent

Open step 7 of the builder

Pick an STT provider

Pick a TTS provider and voice

Save and Publish

When voice is the right mode

Frequently asked questions

Related