How-to

Voice

Voice flips an Agent from text-only to spoken. The microphone captures audio, STT transcribes it, the Agent reasons in text, TTS reads the response back. Both providers are configurable.

What you'll learn
  • How the voice path flows end to end
  • How to enable voice on an Agent
  • Which STT and TTS providers are supported
  • When voice is the right mode and when it is not

The voice path

Microphone input is captured in the client, streamed to the configured STT provider, transcribed to text, passed to the Agent like any other message, and the response is sent through the configured TTS provider for playback. Every leg is traced.

Enable voice on an Agent

  1. 1

    Open step 7 of the builder

    Voice Configuration is off by default. Toggle it on.
  2. 2

    Pick an STT provider

    Choose from the providers registered in Workspace Settings — for example OpenAI Whisper, Google Speech-to-Text, AWS Transcribe, or a self-hosted STT.
  3. 3

    Pick a TTS provider and voice

    Choose a TTS provider and a voice. Most providers ship multiple voices; preview before saving.
  4. 4

    Save and Publish

    Publish the Agent. The test view now shows a microphone icon and reads responses aloud.

When voice is the right mode

Voice fits inbound call deflection, field-ops Agents on mobile, accessibility-first interfaces and hands-free support flows. It is not the right mode for code generation, structured data entry, or anything that needs to be skim-read.

Frequently asked questions

Can I use different STT and TTS providers?
Yes. STT and TTS are configured independently. You might pick Whisper for transcription and ElevenLabs for playback.
Is voice supported in the web widget?
Yes, the widget supports microphone capture and audio playback when the Agent has voice enabled. Browser permission for the microphone is required.
Does voice cost more per Run?
There is added STT and TTS spend on top of the LLM and Tool cost. Each leg is itemized in the Run Trace and rolls up in Analytics.
Can I run voice on-premise?
Yes. Point STT and TTS at self-hosted providers or your private cloud endpoints. The voice pipeline stays inside your infrastructure.