Home News ElevenLabs introduces AI Dubbing into 20 languages

ElevenLabs introduces AI Dubbing into 20 languages

by WeeklyAINews
0 comment

VentureBeat presents: AI Unleashed – An unique government occasion for enterprise knowledge leaders. Community and be taught with trade friends. Learn More


ElevenLabs, a year-old voice cloning and synthesis startup based by former Google and Palantir staff, at this time introduced the launch of AI Dubbing, a devoted product that may translate any speech, together with long-form content material, into greater than 20 completely different languages.

Obtainable to all platform customers, the providing comes as a brand new strategy to dub audio and video content material and may rework an space that has largely been guide for years.

Extra importantly, it may possibly break language limitations for smaller content material creators who don’t have the assets to rent guide translators to transform their content material and take it world.

“We have now examined and iterated this characteristic in collaboration with a whole bunch of content material creators to dub their content material and make it extra accessible to wider audiences,” Mati Staniszewski, CEO and co-founder of ElevenLabs, advised VentureBeat. “We see big potential for unbiased creatives – akin to these creating video content material and podcasts – throughout to movie and TV studios.”

ElevenLabs claims the characteristic can ship high-quality translated audio in minutes (relying on the size of the content material) whereas retaining the unique voice of the speaker, full with their feelings and intonation.

Nonetheless, on this age of AI, when virtually each enterprise is language fashions to drive efficiencies, it’s not the one one exploring speech-to-speech translation.

AI Dubbing: The way it works

Whereas AI-driven translation includes a number of layers of labor, ranging from noise elimination to speech translation, customers on the entrance finish don’t need to undergo any of these steps. They simply have to pick the AI Dubbing device on ElevenLabs, create a brand new mission, choose the supply and goal languages and add the file of the content material.

See also  Meta Unveils Open-Source Speech AI: Recognition of Over 4,000 Spoken Languages

As soon as the content material is uploaded, the device robotically detects the variety of audio system and will get to work with a progress bar showing on the display screen. This is rather like another conversion device on the web. After completion, the file could be downloaded and used.

Behind the scenes, the device works by tapping ElevenLabs’ proprietary methodology to take away background noise, differentiating music and noise from precise dialogue from audio system. It acknowledges which audio system communicate when, conserving their voices distinct, and transcribes what they are saying of their unique language utilizing a speech-to-text mannequin. Then, this textual content is translated, tailored (so lengths match) and voiced within the goal language to provide the specified speech whereas retaining the speaker’s unique voice traits. 

Lastly, the translated speech is synced again with the music and background noise initially faraway from the file, getting ready the dubbed output to be used. EvenLabs claims this work is the end result of its analysis on voice cloning, textual content and audio processing and multilingual speech synthesis. 

For producing the ultimate speech from translated textual content, the corporate faucets its newest Multilingual v2 mannequin. It at present helps greater than 20 languages, together with Hindi, Portuguese, Spanish, Japanese, Ukrainian, Polish and Arabic, giving customers a variety of choices to globalize their content material.

Previous to this end-to-end interface, ElevenLabs provided separate instruments for voice cloning and text-to-speech synthesis. This manner, if one wished to translate their audio content material, like a podcast, into a special language, they first needed to create a clone of their voice on the platform whereas transcribing and translating the audio individually. Then, utilizing the translated textual content file and their cloned speech, they may produce audio from the text-to-speech mannequin. To not point out, this solely labored for speech with none main background music or noise.

See also  Google's Bard chatbot finally launches in the EU, now supports more than 40 languages

Staniszewski confirmed that the brand new dubbing characteristic will likely be accessible to all customers of the platform, however can have some character limits, as has been the case with text-to-speech era. Round one minute of AI Dubbing would usually equate to three,000 characters, he mentioned.

AI-based voices are coming

Whereas ElevenLabs is making headlines with back-to-back developments, it is just the one one exploring AI-based voicing. A couple of weeks again, Microsoft-backed OpenAI made ChatGPT multimodal with the power to have conversations in response to voice prompts, like Alexa.

Right here too the corporate is utilizing speech-to-text and text-to-speech fashions to transform audio, however the know-how just isn’t accessible to all. 

OpenAI mentioned it’s utilizing it with choose companions to forestall misuse of the capabilities. One in every of these is Spotify which is utilizing helps its podcasters transcribe their content material into completely different languages whereas retaining their very own voice.

On his half, Staniszewski mentioned ElevenLabs’ AI Dubbing device differentiates by translating video or audio of any size, containing any variety of audio system, whereas preserving their voice and feelings throughout as much as 20 languages and delivering the very best high quality outcomes.

Different gamers are additionally energetic within the AI-powered voice and speech synthesis area, together with MURF.AI, Play.ht and WellSaid Labs.

Only in the near past, Meta additionally launched SeamlessM4T, an open-source multilingual foundational mannequin that may perceive practically 100 languages from speech or textual content and generate translations into both or each in real-time.

See also  Breaking down language walls: ElevenLabs launches multilingual text-to-speech for diverse audiences

In response to Market US, the worldwide marketplace for such instruments stood at $1.2 billion in 2022 and is estimated to the touch practically $5 billion in 2032, with a CAGR of barely above 15.40%.

Source link

You may also like

logo

Welcome to our weekly AI News site, where we bring you the latest updates on artificial intelligence and its never-ending quest to take over the world! Yes, you heard it right – we’re not here to sugarcoat anything. Our tagline says it all: “because robots are taking over the world.”

Subscribe

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

© 2023 – All Right Reserved.