Microsoft.Extensions.AI Integration
The ElevenLabs SDK implements ISpeechToTextClient from Microsoft.Extensions.AI .
Supported Interfaces
Interface
Support Level
ISpeechToTextClient
Full (file-based and streaming transcription)
ISpeechToTextClient
Installation
dotnet add package ElevenLabs
File-Based Transcription
Transcribe an audio file to text:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 using ElevenLabs ;
using Microsoft.Extensions.AI ;
using var client = new ElevenLabsClient ( apiKey );
ISpeechToTextClient sttClient = client ;
var audioBytes = await File . ReadAllBytesAsync ( "audio.mp3" );
var response = await sttClient . GetTextAsync (
new SpeechToTextOptions
{
AudioData = new DataContent ( audioBytes , "audio/mpeg" ),
ModelId = "scribe_v1" ,
});
Console . WriteLine ( response . Text );
Streaming Transcription
Process transcription results via the streaming interface:
1
2
3
4
5
6
7
8
9
10
11
12 ISpeechToTextClient sttClient = client ;
var audioBytes = await File . ReadAllBytesAsync ( "audio.mp3" );
await foreach ( var update in sttClient . GetStreamingTextAsync (
new SpeechToTextOptions
{
AudioData = new DataContent ( audioBytes , "audio/mpeg" ),
ModelId = "scribe_v1" ,
}))
{
Console . Write ( update . Text );
}
Streaming Behavior
GetStreamingTextAsync delegates to the non-streaming GetTextAsync method internally. The Scribe API processes the audio synchronously, and then the full result is converted to SpeechToTextResponseUpdate events using ToSpeechToTextResponseUpdates().
This means you will not receive incremental transcription updates as audio is processed. The entire transcript is returned at once after processing completes. For most use cases, calling GetTextAsync directly is equivalent and simpler.
Note
ElevenLabs does offer a real-time streaming WebSocket API for speech-to-text. Use client.ConnectRealtimeAsync() to access real-time streaming with interim and committed transcript events.
Transcription with Language Hint
Specify a language for more accurate transcription:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 ISpeechToTextClient sttClient = client ;
var audioBytes = await File . ReadAllBytesAsync ( "french-audio.mp3" );
var response = await sttClient . GetTextAsync (
new SpeechToTextOptions
{
AudioData = new DataContent ( audioBytes , "audio/mpeg" ),
ModelId = "scribe_v1" ,
AdditionalProperties = new AdditionalPropertiesDictionary
{
["language_code"] = "fr" ,
},
});
Console . WriteLine ( response . Text );
Advanced Configuration with RawRepresentationFactory
Use RawRepresentationFactory to access ElevenLabs-specific features like speaker diarization, timestamps, and custom vocabulary:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 ISpeechToTextClient sttClient = client ;
var audioBytes = await File . ReadAllBytesAsync ( "audio.mp3" );
var response = await sttClient . GetTextAsync (
new MemoryStream ( audioBytes ),
new SpeechToTextOptions
{
RawRepresentationFactory = _ => new BodySpeechToTextV1SpeechToTextPost
{
File = audioBytes ,
ModelId = BodySpeechToTextV1SpeechToTextPostModelId . ScribeV1 ,
LanguageCode = "en" ,
Diarize = true ,
TagAudioEvents = true ,
TimestampsGranularity = BodySpeechToTextV1SpeechToTextPostTimestampsGranularity . Word ,
},
});
Console . WriteLine ( response . Text );
Accessing the Raw Response
The full ElevenLabs response is available via RawRepresentation for word-level timestamps, speaker labels, and audio events:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 var response = await sttClient . GetTextAsync (
new MemoryStream ( audioBytes ),
new SpeechToTextOptions
{
ModelId = "scribe_v1" ,
});
// Access the provider-specific response (discriminated union)
if ( response . RawRepresentation is AnyOf < SpeechToTextChunkResponseModel ,
MultichannelSpeechToTextResponseModel ,
SpeechToTextWebhookResponseModel > result )
{
if ( result . IsValue1 && result . Value1 is { } chunk )
{
Console . WriteLine ( $"Language: {chunk.LanguageCode}" );
// Access word-level timestamps and speaker diarization
foreach ( var word in chunk . Words )
{
Console . WriteLine ( $" [{word.Start:F2}s - {word.End:F2}s] {word.Text} " +
$"(type: {word.Type}, speaker: {word.SpeakerId})" );
}
}
}
Token Usage
Usage information is available on the response:
var response = await sttClient . GetTextAsync ( options );
Console . WriteLine ( $"Audio duration: {response.Usage?.InputTokenCount}" );
Console . WriteLine ( $"Characters: {response.Usage?.OutputTokenCount}" );