Home News Breaking down language walls: ElevenLabs launches multilingual text-to-speech for diverse audiences

Breaking down language walls: ElevenLabs launches multilingual text-to-speech for diverse audiences

by WeeklyAINews
0 comment

Head over to our on-demand library to view classes from VB Remodel 2023. Register Right here


ElevenLabs, a year-old startup that’s leveraging the ability of machine studying for voice cloning and synthesis, right now introduced the growth of its platform with a brand new text-to-speech mannequin that helps 30 languages.

The growth marks the platform’s official exit from the beta section, making it prepared to make use of for enterprises and people trying to customise their content material for audiences worldwide. It comes greater than a month after ElevenLabs’ $19 million sequence A spherical that valued the corporate at almost $100M.

“ElevenLabs was began with the dream of constructing all content material universally accessible in any language and in any voice. With the discharge of Eleven Multilingual v2, we’re one step nearer to creating this dream a actuality and making human-quality AI voices accessible in each dialect,” Mati Staniszewski, CEO and cofounder of the corporate, stated in a press release.

“Finally we hope to cowl much more languages and voices with the assistance of AI and remove the linguistic boundaries to content material,” he added.

Eleven Multilingual v2: How is it helpful?

ElevenLabs affords two most important voice-focused AI merchandise – Speech Synthesis and VoiceLab. 

The previous is a synthesis instrument that generates natural-sounding speech from textual content inputs. The latter is an add-on of types that provides customers the flexibility to clone their very own voices or generate totally new artificial voices (by randomly sampling vocal parameters) to be used with the synthesis instrument.

As soon as a person creates their customized voice, they will plug it into the text-to-speech instrument to transform any brief or long-form content material of their selection into their most popular speech – with no effort in any respect. Instead, they might additionally use a bunch of premade AI voices from the corporate or these created and shared publicly by the neighborhood.

See also  From Internet of Things to Internet of Everything: The Convergence of AI & 6G for Connected Intelligence

Within the early days, the synthesis instrument began off with a mannequin that produced speech simply in English. Later, it was expanded to Eleven Multilingual model 1, which used textual content inputs and AI voices to generate speech in six languages: English, Polish, German, Spanish, French, Italian, Portuguese and Hindi. 

Now, with the discharge of the Eleven Multilingual model 2, the providing can now synthesize speech in 30 extra languages. This consists of Korean, Dutch, Turkish, Swedish, Indonesian, Vietnamese, Filipino, Ukrainian, Greek, Czech, End, Romanian, Danish, Bulgarian, Malay, Hungarian, Norwegian, Slovak, Croatian, Traditional Arabic and Tamil.

The transfer primarily means an individual may clone their voice and use it to supply speech in dozens of languages concentrating on totally different markets.

In response to ElevenLabs, the person has to enter the textual content within the language of their selection, choose the voice they need (pre-made, artificial or cloned) and alter a couple of speech parameters. The mannequin will mechanically determine the written language and use the set parameters to generate speech in it. It additionally maintains the chosen voice’s distinctive traits throughout all languages, together with its authentic accent. 

“Our mannequin is ready to perceive the relations between phrases and alter supply based mostly on context (‘contextual’ text-to-speech). As a result of there are not any hardcoded voice options within the mannequin, it may possibly robustly predict hundreds of voice traits whereas creating AI voices. This implies the ElevenLabs mannequin can take the textual content surrounding every generated utterance under consideration to take care of acceptable circulate, quite than producing every utterance individually, which might create voices that sound robotic,” Staniszewski informed VentureBeat.

See also  How to talk about the OpenAI drama at Thanksgiving dinner

Widespread functions of text-to-speech instrument

Since its launch in beta, ElevenLabs has seen curiosity from each enterprises and creators and claims to have registered greater than 1,000,000 customers worldwide. The newest launch is predicted to not solely enhance the person base of the platform but in addition the quantity of content material it generates every day.

“We now have quite a lot of enterprise shoppers utilizing our merchandise and their use circumstances are diversified: from voicing characters in video video games to voicing customer support avatars, and from recording audiobooks to creating content material for the visually impaired,” Staniszewski defined. 

Most not too long ago, the corporate collaborated with ArXiv to publish all their papers with an audio model for added accessibility. It additionally partnered with Storytel to reinforce the choices accessible for audiobooks – providing further AI voices alongside human narrators. In some unspecified time in the future sooner or later, the CEO expects it might additionally have the ability to make dubbing a complete film into a number of languages fully seamless, whereas preserving the accents and feelings of the unique actors. 

Extra to return

As a part of this mission, ElevenLabs plans to broaden its merchandise with extra languages and options, together with a tasks instrument that may make it simpler for customers to construction and edit their long-form content material. In response to Staniszewski, it’ll add a “Google Docs” degree of simplicity to producing speech from lengthier content material.

“By the tip of the 12 months, we’re additionally planning to launch a beta model of our AI dubbing instrument which can enable customers to immediately convert speech from one language to a different, all whereas preserving the unique audio system’ voice,” he famous.

See also  MLPerf 3.1 adds large language model benchmarks for inference

On this house of AI-powered voice and speech era, ElevenLabs competes with gamers like MURF.AI, Play.ht and WellSaid Labs. In response to Market US, the worldwide marketplace for such instruments stood at $1.2 billion in 2022 and is estimated to the touch almost $5 billion in 2032, with a CAGR of barely above 15.40%.

Source link

You may also like

logo

Welcome to our weekly AI News site, where we bring you the latest updates on artificial intelligence and its never-ending quest to take over the world! Yes, you heard it right – we’re not here to sugarcoat anything. Our tagline says it all: “because robots are taking over the world.”

Subscribe

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

© 2023 – All Right Reserved.