Microsoft has been adding more AI-based realistic voices for customers of its Azure AI Speech services over the past year. Today, the company announced it is offering those customers even more additions and improvements for AI voices.
In a blog post, Microsoft stated that it has added more multilingual voices in Azure AI Speech. It said:
These voices are crafted from a variety of source languages, bringing a rich diversity of personas to enhance your user experience. With their authentic and natural interactions, they promise to transform your chatbot engagement through our technology.
The new AI voices include:
- en-GB-AdaMultilingualNeural - en-GB (English – United Kingdom) - Female
- en-GB-OllieMultilingualNeural - en-GB (English – United Kingdom) - Male
- pt-BR-ThalitaMultilingualNeural - pt-BR (Portuguese – Portugal) - Female
- es-ES-IsidoraMultilingualNeural - es-ES (Spanish – Spain) - Female
- es-ES-ArabellaMultilingualNeural - es-ES (Spanish – Spain) - Female
- it-IT-IsabellaMultilingualNeural - it-IT (Italian – Italy) - Female
- it-IT-MarcelloMultilingualNeural - it-IT (Italian – Italy) - Male
- it-IT-AlessioMultilingualNeural - it-IT (Italian – Italy) - Male
In addition, Microsoft has added two more optimized US-based voices in Azure AI Speech that were created specifically to be used in company call centers:
- en-US-LunaNeural - En-US (English – United States) - Female
- en-US-KaiNeural - En-US (English – United States) - Male
Microsoft has made all of these voices available as a public preview in the East US, West Europe, and South East Asia Azure regions.
The company also revealed today it has added five new text-to-speech realistic-looking human avatars for Azure AI Speech users. It also announced some improvements in how those avatars sound:
The Azure OpenAI GPT-4o model is now part of the live chat avatar application in Speech Studio. This allows users to see firsthand the collaborative functioning of the live chat avatar and Azure OpenAI GPT-4o. Additionally, we provide sample code to aid in integrating the text-to-speech avatar with the GPT-4o model.
Finally, Microsoft revealed a new Text Stream API designed to help speed up text-to-speech functions:
The Text Stream API represents a significant leap forward from traditional non-text stream TTS technologies. By accepting input in chunks (as opposed to whole responses), it significantly reduces the latency that typically hinders seamless audio synthesis. The Text Stream API not only minimizes latency but also enhances the fluidity and responsiveness of real-time speech outputs, making it an ideal choice for interactive applications, live events, and responsive AI-driven dialogues.
Developers can check out some sample code for the Text Stream API on GitHub.