The RealtimeVoiceClient provides a WebSocket client for the xAI Realtime Voice Agent API at wss://api.x.ai/v1/realtime. It supports bidirectional audio/text streaming, server-side voice activity detection (VAD), function calling, and built-in tools like web search and X search.
usingXai;usingvarclient=newRealtimeVoiceClient(apiKey);awaitclient.ConnectAsync();// Configure the sessionawaitclient.SendEventAsync(RealtimeClientEvent.SessionUpdate(newRealtimeSessionConfig{Voice="Eve",Instructions="You are a helpful assistant. Respond briefly.",Modalities=["text","audio"],TurnDetection=newRealtimeTurnDetection{Type="server_vad",Threshold=0.85,SilenceDurationMs=500,},}));// Send a text message and request a responseawaitclient.SendEventAsync(RealtimeClientEvent.UserMessage("What's the weather like today?"));awaitclient.SendEventAsync(RealtimeClientEvent.CreateResponse(["text"]));// Process server eventsawaitforeach(varserverEventinclient.ReceiveUpdatesAsync(cancellationToken)){if(serverEvent.IsAudioTranscriptDelta)Console.Write(serverEvent.Delta);elseif(serverEvent.IsResponseDone)break;elseif(serverEvent.IsError)Console.Error.WriteLine($"Error: {serverEvent.Error?.Message}");}
Voices
Five voices are available:
Voice
Description
Eve
Default voice
Ara
Alternative voice
Rex
Alternative voice
Sal
Alternative voice
Leo
Alternative voice
Audio Formats
Configure input/output audio formats via RealtimeAudioConfig:
1 2 3 4 5 6 7 8 9101112131415161718192021222324
awaitclient.SendEventAsync(RealtimeClientEvent.SessionUpdate(newRealtimeSessionConfig{Voice="Eve",Modalities=["text","audio"],Audio=newRealtimeAudioConfig{Input=newRealtimeAudioDirectionConfig{Format=newRealtimeAudioFormatConfig{Type="audio/pcm",// Linear16 little-endianRate=24000,// Sample rate in Hz},},Output=newRealtimeAudioDirectionConfig{Format=newRealtimeAudioFormatConfig{Type="audio/pcm",Rate=24000,},},},}));
// Read audio from a file or microphonebyte[]audioChunk=GetAudioChunk();// your audio sourcestringbase64Audio=Convert.ToBase64String(audioChunk);// Send audio dataawaitclient.SendEventAsync(RealtimeClientEvent.AppendAudio(base64Audio));// Commit the audio buffer (in manual mode, or to force processing)awaitclient.SendEventAsync(RealtimeClientEvent.CommitAudio());
Turn Detection
Server VAD (Automatic)
The server automatically detects when the user starts and stops speaking:
1234567
TurnDetection=newRealtimeTurnDetection{Type="server_vad",Threshold=0.85,// Sensitivity (0.1 - 0.9)SilenceDurationMs=500,// Silence before ending turn (0 - 10000)PrefixPaddingMs=333,// Audio to include before speech start (0 - 10000)}
Manual Mode
Set TurnDetection to null for manual control — you decide when to commit the audio buffer and trigger a response:
awaitclient.SendEventAsync(RealtimeClientEvent.SessionUpdate(newRealtimeSessionConfig{Voice="Eve",Modalities=["text","audio"],Tools=[RealtimeTool.Function(name:"get_weather",description:"Get the current weather for a location",parameters:new{type="object",properties=new{location=new{type="string",description="City name"},},required=new[]{"location"},}),],}));// Handle function calls in the event loopawaitforeach(varserverEventinclient.ReceiveUpdatesAsync(cancellationToken)){if(serverEvent.IsFunctionCallArgumentsDone){// Execute the functionvarresult=ExecuteFunction(serverEvent.Name!,serverEvent.Arguments!);// Send the result backawaitclient.SendEventAsync(RealtimeClientEvent.FunctionCallOutput(serverEvent.CallId!,result));awaitclient.SendEventAsync(RealtimeClientEvent.CreateResponse(["text","audio"]));}elseif(serverEvent.IsResponseDone){break;}}
Built-in Tools
The xAI Realtime API includes built-in tools:
1 2 3 4 5 6 7 8 910111213
Tools=[// Web search — searches the internetRealtimeTool.WebSearch(),// X search — searches posts on X (Twitter)RealtimeTool.XSearch(),// X search with allowed handles filterRealtimeTool.XSearch(allowedHandles:["@elonmusk","@xaboratory"]),// File search — searches uploaded document collectionsRealtimeTool.FileSearch(collectionIds:["col_abc123"],maxResults:5),]
Server Events Reference
Use the Is* helper properties to check event types: