Live API (Real-time Voice/Video)
The SDK supports the Gemini Live API for real-time bidirectional voice and video interactions over WebSocket.
Overview
The Live API enables:
- Real-time audio conversations with voice activity detection (VAD)
- Video frame streaming for live visual understanding
- Tool calling during live sessions
- Session resumption for reconnection without losing context
- Auto-reconnect via
ResilientLiveSession for production use
Model requirements
The Live API requires a native-audio model (e.g., models/gemini-2.5-flash-native-audio-latest) and Audio response modality. Text-only response modality is not supported with these models.
Quick Start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 | using Google.Gemini;
using var client = new GeminiClient(apiKey);
await using var session = await client.ConnectLiveAsync(new LiveSetupConfig
{
Model = "models/gemini-2.5-flash-native-audio-latest",
GenerationConfig = new GenerationConfig
{
ResponseModalities = [GenerationConfigResponseModalitie.Audio],
},
});
// Send text, receive audio
await session.SendTextAsync("Hello, how are you?");
await foreach (var message in session.ReadEventsAsync())
{
// Audio data in message.ServerContent.ModelTurn.Parts[].InlineData
if (message.ServerContent?.TurnComplete == true)
break;
}
|
Session Types
GeminiLiveSession
The basic session class for direct WebSocket communication:
| await using var session = await client.ConnectLiveAsync(config);
|
ResilientLiveSession
A wrapper that automatically reconnects when the server sends a GoAway message (e.g., for maintenance). Recommended for production use:
1
2
3
4
5
6
7
8
9
10
11
12
13 | await using var session = await client.ConnectResilientLiveAsync(config);
session.GoAwayReceived += (sender, goAway) =>
Console.WriteLine($"Server closing in {goAway.TimeLeft}, reconnecting...");
session.Reconnected += (sender, _) =>
Console.WriteLine("Reconnected!");
// Events flow transparently across reconnections
await foreach (var message in session.ReadEventsAsync())
{
if (message.ServerContent?.TurnComplete == true)
break;
}
|
Text
| // Sends text as a complete user turn (triggers model response)
await session.SendTextAsync("What's the weather?");
|
Audio
| // Send PCM audio (16-bit, 16kHz, little-endian, mono)
await session.SendAudioAsync(pcmBytes);
// Send with custom MIME type
await session.SendAudioAsync(audioBytes, "audio/pcm;rate=24000");
// Signal end of turn (if VAD is not handling it)
await session.SendClientContentAsync(turns: [], turnComplete: true);
|
Video
| // Send a JPEG frame
await session.SendVideoAsync(jpegBytes, "image/jpeg");
// Stream video at ~10 fps
foreach (var frame in videoFrames)
{
await session.SendVideoAsync(frame, "image/jpeg");
await Task.Delay(100);
}
|
Multi-turn Conversation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 | await session.SendClientContentAsync(
turns:
[
new Content
{
Role = "user",
Parts = [new Part { Text = "My name is Alice" }],
},
new Content
{
Role = "model",
Parts = [new Part { Text = "Nice to meet you, Alice!" }],
},
new Content
{
Role = "user",
Parts = [new Part { Text = "What's my name?" }],
},
],
turnComplete: true);
|
Receiving Events
All responses come through ReadEventsAsync():
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46 | await foreach (var message in session.ReadEventsAsync())
{
// Audio/text model response
if (message.ServerContent?.ModelTurn?.Parts is { } parts)
{
foreach (var part in parts)
{
if (part.InlineData?.Data is { } audioData)
PlayAudio(audioData); // 24kHz PCM
if (part.Text is { } text)
Console.Write(text);
}
}
// Output transcription (text version of audio response)
if (message.ServerContent?.OutputTranscription?.Text is { } transcript)
Console.Write(transcript);
// Input transcription (text version of audio you sent)
if (message.ServerContent?.InputTranscription?.Text is { } inputText)
Console.Write($"[You said: {inputText}]");
// Model was interrupted by new user input
if (message.ServerContent?.Interrupted == true)
Console.Write("[interrupted]");
// Tool call request
if (message.ToolCall is { } toolCall)
HandleToolCall(toolCall);
// Tool call was cancelled (user interrupted)
if (message.ToolCallCancellation is { } cancellation)
Console.Write($"Cancelled: {string.Join(", ", cancellation.Ids!)}");
// Token usage
if (message.UsageMetadata is { } usage)
Console.Write($"[Tokens: {usage.TotalTokenCount}]");
// Server requesting disconnect (handled automatically by ResilientLiveSession)
if (message.GoAway is { } goAway)
Console.Write($"[Server closing in {goAway.TimeLeft}]");
if (message.ServerContent?.TurnComplete == true)
break;
}
|
Configuration Options
Voice Selection
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | var config = new LiveSetupConfig
{
Model = "models/gemini-2.5-flash-native-audio-latest",
GenerationConfig = new GenerationConfig
{
ResponseModalities = [GenerationConfigResponseModalitie.Audio],
SpeechConfig = new SpeechConfig
{
VoiceConfig = new VoiceConfig
{
PrebuiltVoiceConfig = new PrebuiltVoiceConfig
{
VoiceName = "Kore", // Aoede, Charon, Fenrir, Kore, Puck, etc.
},
},
},
},
};
|
System Instruction
| var config = new LiveSetupConfig
{
// ...
SystemInstruction = new Content
{
Parts = [new Part { Text = "You are a friendly pirate. Always respond in pirate speak." }],
},
};
|
Transcription
| var config = new LiveSetupConfig
{
// ...
// Get text alongside audio responses
OutputAudioTranscription = new LiveOutputAudioTranscription(),
// Get text for audio you send
InputAudioTranscription = new LiveInputAudioTranscription(),
};
|
Context Window Compression
For longer sessions that might exceed the context window:
| var config = new LiveSetupConfig
{
// ...
ContextWindowCompression = new LiveContextWindowCompression
{
SlidingWindow = new LiveSlidingWindow
{
TargetTokens = 1024, // tokens to retain after compression
},
},
};
|
Session Resumption
Reconnect without losing conversation context:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | var config = new LiveSetupConfig
{
// ...
SessionResumption = new LiveSessionResumptionConfig(),
};
await using var session1 = await client.ConnectLiveAsync(config);
// ... interact ...
var handle = session1.LastSessionResumptionHandle;
// Later, reconnect with the handle
var config2 = new LiveSetupConfig
{
// ... same config ...
SessionResumption = new LiveSessionResumptionConfig { Handle = handle },
};
await using var session2 = await client.ConnectLiveAsync(config2);
|
Tip
ResilientLiveSession handles this automatically when GoAway is received.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 | var config = new LiveSetupConfig
{
Model = "models/gemini-2.5-flash-native-audio-latest",
GenerationConfig = new GenerationConfig
{
ResponseModalities = [GenerationConfigResponseModalitie.Audio],
},
Tools = [new Tool { FunctionDeclarations = [myFunction] }],
};
await using var session = await client.ConnectLiveAsync(config);
await session.SendTextAsync("What's the weather in London?");
await foreach (var message in session.ReadEventsAsync())
{
if (message.ToolCall is { } toolCall)
{
await session.SendToolResponseAsync([new FunctionResponse
{
Name = toolCall.FunctionCalls![0].Name,
Id = toolCall.FunctionCalls[0].Id,
Response = new { temperature = "15C" },
}]);
}
if (message.ToolCallCancellation is { } cancellation)
{
Console.WriteLine($"Tool calls cancelled: {string.Join(", ", cancellation.Ids!)}");
}
if (message.ServerContent?.TurnComplete == true)
break;
}
|
Session Limits
| Scenario |
Duration |
| Audio only |
~15 minutes |
| Audio + video |
~2 minutes (without compression) |
| Connection lifetime |
~10 minutes (use session resumption or ResilientLiveSession) |
Samples