Text-to-Speech and Speech-to-Text
GeminiClient exposes Gemini's audio surface in two complementary forms:
| Surface | Entry point |
|---|---|
| Text-to-speech (TTS) | client.SpeakAsync(text, voiceName, modelId, languageCode) |
| Speech-to-text (STT, MEAI) | ((Microsoft.Extensions.AI.ISpeechToTextClient)client).GetTextAsync(stream, options) |
| Speech-to-text (convenience) | client.TranscribeAsync(audioData, mimeType, modelId, prompt) |
The default TTS model is gemini-3.1-flash-tts-preview, which supports inline
audio-control tags (200+) and 70+ languages.
Synthesizing speech
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Useful helpers shipped alongside SpeakAsync:
GeminiAudioTags— strongly-typed constants for emotion / style / delivery / pacing tags.GeminiVoices— 30 prebuilt voice names, plusGeminiVoices.Allfor iteration.client.ListTtsModelsAsync()— live discovery of every TTS-capable model.AudioResult.SampleRateHz/AudioResult.ParseSampleRateHz(mime)— extract the sample rate from the response MIME type without string-mangling in caller code.
Transcribing through MEAI
GeminiClient implements Microsoft.Extensions.AI.ISpeechToTextClient, so anything
that consumes that interface can swap providers without code changes.
1 2 3 4 5 6 7 | |
The implementation auto-sniffs WAV / Ogg / FLAC / MP3 magic bytes and falls back
to audio/wav. Pass a custom MIME type via
SpeechToTextOptions.RawRepresentationFactory when you know the format already.
Round-trip walk-through
The full TTS → save WAV → STT flow is wired up in
samples/AudioRoundTrip,
which you can run with:
1 2 3 | |
The sample:
- Synthesizes speech with
SpeakAsync, defaulting toGeminiVoices.Puck. - Wraps the returned PCM in a WAV header (
audio_round_trip.wav) so you can play it back locally. - Calls
ISpeechToTextClient.GetTextAsyncon the WAV stream and prints the transcribed text — proving the new STT interface plugs into any MEAI-aware pipeline.
Free-tier quota: Gemini's free tier currently allows ~10 TTS requests per day per model. The sample handles HTTP 429 cleanly and prints an explanatory message instead of throwing.