The ElevenLabs SDK implements ITextToSpeechClient and ISpeechToTextClient from Microsoft.Extensions.AI.
Supported Interfaces
Interface
Support Level
ITextToSpeechClient
Full (Flash/Turbo TTS, binary output, streamed audio chunks)
ISpeechToTextClient
Full (file-based and streaming transcription)
ITextToSpeechClient
Generate speech through the standard MEAI interface:
1 2 3 4 5 6 7 8 9101112131415161718
usingElevenLabs;usingMicrosoft.Extensions.AI;usingvarclient=newElevenLabsClient(apiKey);ITextToSpeechClientttsClient=client;varresponse=awaitttsClient.GetAudioAsync("ElevenLabs Flash is available through Microsoft.Extensions.AI.",newTextToSpeechOptions{ModelId="eleven_flash_v2_5",VoiceId="your-voice-id",AudioFormat="mp3",Speed=1.05f,});varaudio=response.Contents.OfType<DataContent>().Single();File.WriteAllBytes("elevenlabs.mp3",audio.Data.ToArray());
Stream audio chunks with the same abstraction:
1 2 3 4 5 6 7 8 9101112131415161718
awaitforeach(varupdateinttsClient.GetStreamingAudioAsync("Streaming text-to-speech starts returning audio before the full response is buffered.",newTextToSpeechOptions{ModelId="eleven_flash_v2_5",VoiceId="your-voice-id",AudioFormat="mp3",AdditionalProperties=new(){[ElevenLabsTextToSpeechPropertyNames.OptimizeStreamingLatency]=3,},})){foreach(varchunkinupdate.Contents.OfType<DataContent>()){Console.WriteLine($"{update.Kind}: {chunk.Data.Length} bytes");}}
Use ElevenLabsTextToSpeechPropertyNames for provider-specific settings such as exact output formats, latency optimization, text normalization, stability, similarity boost, style, and speaker boost.
GetStreamingTextAsync delegates to the non-streaming GetTextAsync method internally. The Scribe API processes the audio synchronously, and then the full result is converted to SpeechToTextResponseUpdate events using ToSpeechToTextResponseUpdates().
This means you will not receive incremental transcription updates as audio is processed. The entire transcript is returned at once after processing completes. For most use cases, calling GetTextAsync directly is equivalent and simpler.
Note
ElevenLabs does offer a real-time streaming WebSocket API for speech-to-text. Use client.ConnectRealtimeAsync() to access real-time streaming with interim and committed transcript events.
Transcription with Language Hint
Specify a language for more accurate transcription: