Blog Post

Prmagazine > News > News > Amazon unveils a new AI voice model, Nova Sonic | TechCrunch
Amazon unveils a new AI voice model, Nova Sonic | TechCrunch

Amazon unveils a new AI voice model, Nova Sonic | TechCrunch

On Tuesday, Amazon launched its first new generative AI model, Nova Sonic, that can process sound locally and produce natural speeches. Amazon claims Sonic’s performance is competitive with OpenAI and Google’s Frontier Voice model to measure benchmarks for speed, speech recognition and conversation quality.

Nova Sonic is Amazon’s answer to newer AI voice models, such as model powering Chatgpt’s voice modespeak more naturally with Amazon’s more rigid model early on. Recent technological breakthroughs have made legacy models and the digital assistants they support (such as Alexa and Apple’s Siri) seem to be more incredible.

Nova Sonic is available through Amazon’s developer platform Bedrock for building enterprise AI applications through the new two-way streaming API. In a press release, Amazon called Nova Sonic “the most cost-effective” AI voice model on the market, about 80% cheaper than OpenAI’s GPT-4O.

The components of Nova Sonic are already in power Alexa+, Amazon’s upgraded digital voice assistantAccording to Amazon Senior Vice President and Director of Agi Rohit Prasad.

Prasad told TechCrunch in an interview that Nova Sonic builds on Amazon’s expertise in “large orchestration systems,” the technical scaffolding that makes up Alexa. Prasad said Nova Sonic is good at routing user requests to different APIs compared to competitor AI voice models. This feature helps Nova Sonic “know” when it needs to get real-time information from the Internet, parse proprietary data sources, or take action in an external application, and use the appropriate tools to do this.

Amazon said in the two-way conversation, Nova Sonic waited for a speech “in due time” and took into account the pauses and interruptions from the spokesperson. It also generates text transcripts for users’ voice, which developers can use for a variety of applications.

According to Prasad, Nova Sonic is less prone to speech recognition errors than other AI voice models, meaning that the model is relatively good at understanding users’ intentions, even if they mumble, Mildspeak or are in a noisy environment. Amazon says that the benchmarks for language and dialects measure speech recognition in languages ​​and dialects, and Amazon says that when averaged across English, French, Italian, German and Spanish, Nova Sonic’s word error rate (WER) was only 4.2%. This means that about 4 out of every 100 words from the model differ from human transcription in these languages.

In another benchmark that measures loud interactions with multiple participants, Amazon says Nova Sonic is more accurate in WER than Openai’s GPT-4O-Transcribe Model. According to Amazon, the Nova Sonic’s speed is also industry-leading, with an average perceived latency of 1.09 seconds. This makes it faster than the GPT-4O model, with the real-time API powered by OpenAI responding in 1.18 seconds, benchmarking with manual analysis.

Prasad said Nova Sonic is part of Amazon’s broader strategy to build AGI (artificial universal intelligence), which the company defines as “an AI system that can do anything on a computer that anyone can do.” Moving forward, Prasad said Amazon plans to release more AI models that can understand different ways, including images, videos, and sounds, and “other relevant sensory data if you bring things into the physical world.”

The Amazon AGI unit, which Prasad is responsible for, now seems to play a bigger role in the company’s product strategy. Just last week, Amazon Launched a preview of Nova Acta browser that uses AI models, seems to be Alexa+ and Amazon buys features for me. Prasad started with Nova Sonic, which hopes to provide developers with more internal AI models.

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback