OpenAI upgrades its transcription and voice-generating AI models

Openai brings a new transcriptional and speech-generating AI model to its API, which the company claims has improved on its previous releases.

For OpenAI, these models fit into its broader “agent” vision: building automated systems that can perform tasks independently on behalf of users. The definition of “agent” may be controversialbut Olivier Godemont, the head of product at Openai, described an explanation as a chatbot that can talk to corporate customers.

“We will see more and more agents pop up in the coming months,” Goldmont told TechCrunch in a briefing. “So the general topic is to help customers and developers take advantage of useful, available and accurate agents.”

Openai claims that its new text-to-speech model, “GPT-4O-MINI-TTS”, not only provides nuanced and more realistic voice, but is also more “coordinated” than previous generations of speech synthesis models. Developers can guide GPT-4O-Mini-TTS how to speak in natural language, for example, “speak like a crazy scientist” or “use a quiet voice, like a mindful teacher.”

This is a “real criminal style”, weathered voice:

Openai wer gpt-4o-transcribe — Results of OpenAI’s internal speech recognition benchmark.Image source:Openai

Source link

OpenAI upgrades its transcription and voice-generating AI models | TechCrunch

Rerun’s open-source AI platform for robots, drones and cars revs up with $17M seed | TechCrunch

Threads will finally let everyone change their default feed

Leave a comment Cancel reply

Categories

Contact us

Hand Picked News

Startup Founder Claims Elon Musk Is Stealing the Name ‘Grok’

Israeli forces order evacuation for most of Rafah ahead of

Charity is key to our Catholic faith. We pray Supreme

Blog Post