ElevenLabs is launching its own speech-to-text model

Elevenlabsan AI startup company, just proposed Large capital reposses of $180 millionmainly known for its audio production capabilities. The company has taken another technological step in the direction of technology by launching its first standalone voice-to-text model called Scribe.

Start a business, Valued at $3.3 billionthrough its vast library of sounds, provides help from many other companies to provide voice-to-text services. However, the company is now seeking to do voice detection and compete with it Gladya,,,,, Phonetics,,,,, assembly,,,,, Darkand Openai’s whispering model.

Elevenlabs’ scribe model supports over 99 languages at startup. The company classifies over 25 languages into excellent accuracy categories, with the model word error rate being less than 5%. The list includes English (claimed accuracy of 97%), French, German, Indian, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish and Vietnamese. Other languages rank high in different categories (5-10% word error rate), good (10% to 20% word error rate) and medium (25% to 50%) word error rate.

The company said the model outperforms Google Gemini 2.0 Flash and Whisper large V3s in multiple languages in Fleurs and common voice benchmarks.

Elevenlabs has developed a voice-to-text component for its AI conversation proxy platform, which was released last year. But this is the first time The company is releasing independent voice detection models. In a conversation with TechCrunch last month, CEO Mati Staniszewski talked about improving the voice detection model.

“We want to better understand what you say in the conversation. We are working to get rid of ways to produce only content, understand and transcribe the pronunciation,” Staniszewski said at the time. “Many people say voice-to-text is a problem-solving problem. But for many languages, it’s terrible. We think we can build a better speech detection model because we have internal teams that annotate data and give us quick feedback.”

The model also has smart speaker diagnostics that tell you who is speaking, timestamps at the word level for accurate subtitles, and automatically compelling sound events such as viewer laughter. The startup provides customers with a way to directly transcribe video content to add subtitles or subtitles to their studio.

The scribe is currently only available for pre-recorded audio formats. The company said it will release a low-latency live version of the model soon. This means it has not yet been effective in meeting transcription or speech notes.

Elevenlabs’ scribe price is $0.40, one hour of transcription audio. Although interest rates are competitive, Some of these competitors Offer a lower price For audio transcription that currently has certain functional differences.

Source link

ElevenLabs is launching its own speech-to-text model | TechCrunch

Best Internet Providers in Toledo, Ohio

OctoTools: Stanford’s open-source framework optimizes LLM reasoning through modular tool orchestration

Leave a comment Cancel reply

Categories

Contact us

Hand Picked News

Bill Maher reveals Trump was ‘gracious and measured’ at White

California defies Trump order to certify that all school districts

Today's NYT Mini Crossword Answers for April 12 – CNET

Blog Post