Microsoft.Extensions.AI Integration
Cross-SDK comparison
See the centralized MEAI documentation for feature matrices and comparisons across all tryAGI SDKs.
The FishAudio SDK implements ITextToSpeechClient and ISpeechToTextClient, and provides AIFunction tool wrappers, all compatible with Microsoft.Extensions.AI.
ITextToSpeechClient
FishAudio implements ITextToSpeechClient for text-to-speech synthesis using
the /v1/tts and /v1/tts/stream-with-timestamps endpoints. The default model
is s2.1-pro-free; pass TextToSpeechOptions.ModelId to select s2.1-pro,
s2-pro, s1, or another model string supported by Fish Audio.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Text-to-Speech Options
- ModelId: Sets the Fish Audio
modelheader. Defaults tos2.1-pro-free. - VoiceId: Maps to Fish Audio
reference_idfor a single voice model. - AudioFormat: Supports
mp3,wav,pcm, andopus. - Speed and Volume: Map to Fish Audio
prosody.speedandprosody.volume. - RawRepresentationFactory: Pass a pre-configured
TTSRequestorTTSStreamWithTimestampRequestfor full control.
Use AdditionalProperties with FishAudioTextToSpeechPropertyNames for
provider-specific controls such as Latency, SampleRate, Temperature,
ChunkLength, and NormalizeLoudness.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Streaming Text-to-Speech
GetStreamingAudioAsync uses Fish Audio's timestamp streaming endpoint and emits
MEAI TextToSpeechResponseUpdate events. Audio chunks arrive as DataContent,
and the provider event is available in RawRepresentation.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
ISpeechToTextClient
FishAudio implements ISpeechToTextClient for speech-to-text transcription
using the /v1/asr endpoint.
1 2 3 4 5 6 7 8 9 10 11 | |
Options
- SpeechLanguage: Set the transcription language (e.g.,
"en","zh") - RawRepresentationFactory: Pass a pre-configured
CreateAsrRequestfor full control
1 2 3 4 5 | |
Advanced Configuration with RawRepresentationFactory
Use RawRepresentationFactory to access Fish Audio-specific ASR features like timestamp control and language selection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Accessing the Raw Response
The full Fish Audio ASR response is available via RawRepresentation for segment-level timestamps and duration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Streaming Behavior
GetStreamingTextAsync delegates to the non-streaming GetTextAsync method internally. The Fish Audio ASR API processes audio synchronously, and then the full result is converted to SpeechToTextResponseUpdate events using ToSpeechToTextResponseUpdates().
This means you will not receive incremental transcription updates as audio is processed. The entire transcript is returned at once after processing completes. For most use cases, calling GetTextAsync directly is equivalent and simpler.
Available Tools
| Method | Tool Name | Description |
|---|---|---|
AsTextToSpeechTool() |
FishAudioTextToSpeech |
Converts text to speech audio using Fish Audio's TTS API. |
AsListModelsTool() |
FishAudioListModels |
Lists available voice models, optionally filtered by title. |
AsGetModelTool() |
FishAudioGetModel |
Gets details for a specific voice model by ID. |
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Tool Details
FishAudioTextToSpeech
Converts text to speech using Fish Audio's TTS API. Supports 50+ languages,
voice cloning, and multi-speaker synthesis. Accepts an optional referenceId parameter
to specify a default voice model, and a model parameter to choose between s2-pro (default)
and s1.
1 2 3 | |
FishAudioListModels
Lists available voice models from Fish Audio. Returns model IDs, titles, descriptions,
languages, and popularity metrics. Accepts a pageSize parameter (default: 10).
1 | |
FishAudioGetModel
Gets details for a specific Fish Audio voice model by its ID. Returns the model title, description, languages, state, tags, and sample information.
1 | |