Skip to main content
The LLM (Large Language Model) powers your agent’s conversation intelligence — understanding callers and generating responses.

Providers and models

ProviderModelTypeNotes
OpenAIGPT 4.1StandardBest overall quality
AzureGPT 4.1StandardSame model, hosted on Azure
OpenAIGPT 4.1 miniStandardFaster, lower cost
GoogleGemini 2.5 FlashStandardFast with good quality
GoogleGemini 2.5 Fast LiveNative RealtimeUltra-low latency, built-in voice
GoogleGemini 2.5 Fast Live CascadeCascadeRealtime with text fallback
OpenAIRealtimeNative RealtimeUltra-low latency, built-in voice
OpenAIRealtime CascadeCascadeRealtime with text fallback

Standard vs Realtime vs Cascade

  • Standard — Text-based LLM with separate TTS/STT. Most flexible voice options. Best for complex conversations.
  • Native Realtime — Voice-native model with built-in speech. Lowest latency. Limited voice selection (provider’s built-in voices only).
  • Cascade — Starts with realtime for fast initial response, then falls back to text-based processing. Good balance of speed and quality.

Temperature

Controls how creative vs deterministic the agent’s responses are.
ValueBehaviorBest for
0.0 - 0.3Very consistent, predictableFactual Q&A, compliance-sensitive scenarios
0.4 - 0.7Balanced (default: 0.7)General customer service
0.8 - 1.2More varied, creativeCasual conversations
1.3 - 2.0Highly creative, less predictableNot recommended for production

Impact on other settings

When you select a Native Realtime provider:
  • TTS provider is automatically set to the realtime provider’s built-in voice
  • STT provider is set to null (built-in)
  • Voice selection switches to the provider’s available realtime voices
  • Speaking rate is not adjustable

Who can edit

Only Super-Admin and Dev-Admin roles can configure LLM settings. Client-Admin and Client-Employee users don’t see this section.