Advanced Configuration with RawRepresentationFactory
Use RawRepresentationFactory to access Cartesia-specific features like audio encoding and timestamp granularity:
1 2 3 4 5 6 7 8 9101112131415161718192021222324
ISpeechToTextClientsttClient=client;usingvaraudioStream=File.OpenRead("audio.wav");varresponse=awaitsttClient.GetTextAsync(audioStream,newSpeechToTextOptions{RawRepresentationFactory=_=>newSttTranscribeRequest{Model="ink-whisper",Language=SttTranscribeRequestLanguage.En,TimestampGranularities=[TimestampGranularity.Word],},});Console.WriteLine(response.Text);// Access word-level timestamps from the raw responsevarraw=(TranscriptionResponse)response.RawRepresentation!;if(raw.Wordsis{Count:>0}words){foreach(varwordinwords){Console.WriteLine($" [{word.Start:F2}s - {word.End:F2}s] {word.Word}");}}
Streaming Behavior
GetStreamingTextAsync delegates to the non-streaming GetTextAsync method internally. The Cartesia STT API processes audio synchronously (no polling needed), and then the full result is converted to SpeechToTextResponseUpdate events using ToSpeechToTextResponseUpdates().
This means you will not receive incremental transcription updates as audio is processed. The entire transcript is returned at once after processing completes. For most use cases, calling GetTextAsync directly is equivalent and simpler.
Accessing the Underlying Client
Retrieve the CartesiaClient from the MEAI interface:
1234567
ISpeechToTextClientsttClient=client;varmetadata=sttClient.GetService<SpeechToTextClientMetadata>();Console.WriteLine($"Provider: {metadata?.ProviderName}");// "cartesia"varcartesiaClient=sttClient.GetService<CartesiaClient>();// Use cartesiaClient for Cartesia-specific APIs (TTS, voice cloning, agents, etc.)