Blog Post

Prmagazine > News > News > OpenAI upgrades its transcription and voice-generating AI models | TechCrunch
OpenAI upgrades its transcription and voice-generating AI models | TechCrunch

OpenAI upgrades its transcription and voice-generating AI models | TechCrunch

Openai brings a new transcriptional and speech-generating AI model to its API, which the company claims has improved on its previous releases.

For OpenAI, these models fit into its broader “agent” vision: building automated systems that can perform tasks independently on behalf of users. The definition of “agent” may be controversialbut Olivier Godemont, the head of product at Openai, described an explanation as a chatbot that can talk to corporate customers.

“We will see more and more agents pop up in the coming months,” Goldmont told TechCrunch in a briefing. “So the general topic is to help customers and developers take advantage of useful, available and accurate agents.”

Openai claims that its new text-to-speech model, “GPT-4O-MINI-TTS”, not only provides nuanced and more realistic voice, but is also more “coordinated” than previous generations of speech synthesis models. Developers can guide GPT-4O-Mini-TTS how to speak in natural language, for example, “speak like a crazy scientist” or “use a quiet voice, like a mindful teacher.”

This is a “real criminal style”, weathered voice:

Here is a sample of women’s “professional” voices:

Openai product worker Jeff Harris told TechCrunch that the goal is to let developers tailor the sound of “experience” and “background”.

“In different situations, you want not only flat, monotonous sounds,” Harris continued. “If you are in a customer support experience and you want the sound to be apologized because it made a mistake, you can actually make the sound have that emotion.” […] Our biggest belief here is that developers and users must control not only what they say, but how they speak. ”

As for Openai’s new speech to text models, “GPT-4O-Transcribe” and “GPT-4O-Mini-Transcribe” effectively replace the company’s long and long time Whispering transcription model. The new model is trained on “diverse high-quality audio datasets” to better capture emphasis and diverse voice even in a chaotic environment.

Harris added that they are unlikely to hallucinate. Notorious vocals tend to make up words – Even the entire paragraph – In the conversation, everything about race comments to imagined medical treatment is introduced into the transcript.

“[T]”Ensure that the model is accurate is essential to obtain a reliable voice experience, and accuracy is essential,” Harris said. [in this context] It means that the model just hears these words [and] No details they didn’t hear were filled in. ”

However, your mileage may vary depending on the language you are transcribing.

According to OpenAI’s internal benchmarks, GPT-4O-Transcribe is a more accurate GPT-4O-Transcribe of two transcription models, with a “word error rate” close to 30% of AIND and Dravidian languages ​​such as Tamil, Telugu, Malayalam and Kannada. This means that the model misses about 3 out of every 10 words in these languages.

Openai wer gpt-4o-transcribe
Results of OpenAI’s internal speech recognition benchmark.Image source:Openai

In breaking away from tradition, Openai does not intend to publicly provide its new transcriptional model. company New version of whispers have been released in history For commercial use under MIT license.

Harris said that GPT-4O-Transcribe and GPT-4O-Mini-Transcribe are “much larger than the Whisper” and are therefore not good candidates for public offerings.

“[T]Hey, you’re not the kind of model that runs locally on your laptop like a whisper,” he continued.[W]e want to make sure that if we are publishing content in open source, we will be well thought out and that our model does meet specific needs. And we think end-user devices are one of the most interesting cases of open source models. ”

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback