Skip to main content
Configure how your agent sounds — the voice, speaking speed, and speech recognition.

Text-to-Speech (TTS)

The TTS provider converts the agent’s text responses into spoken audio.

Providers

ProviderVoicesNotes
Cartesia10+ voicesDefault. High quality, fast.
ElevenLabsMultiplePremium voice quality, custom voice cloning
Gemini LivePuck, Fenrir, etc.Only with Gemini Live LLM
OpenAI Realtimemarin, etc.Only with OpenAI Realtime LLM

Voice selection

Each provider offers a set of predefined voices. Select from the dropdown in the agent detail page. Custom voice: If you have a custom voice model (e.g., a cloned voice on ElevenLabs), enter the voice UUID in the custom voice ID field.

Speaking rate

Adjust how fast the agent speaks:
RateEffect
0.5Very slow
1.0Normal speed (default)
1.5Noticeably faster
2.0Maximum speed
Note: Speaking rate is not adjustable for Native Realtime providers (Gemini Live, OpenAI Realtime).

Speech-to-Text (STT)

The STT provider transcribes the caller’s speech into text for the LLM.
ProviderNotes
DeepgramDefault. Fast and accurate for German and English.
ElevenLabsGood multilingual support
MistralAlternative option
Note: STT is not configurable for Native Realtime LLMs — they have built-in speech recognition.

Who can edit

RoleVoice/TTSSpeaking RateSTT
Super-Admin / Dev-AdminFullFullFull
Client-AdminFullFullHidden
Client-EmployeeRead-onlyRead-onlyHidden