An end-to-end speech translation model

Google is actively integrating Synthetic Intelligence to its merchandise nowadays. Just lately, Google AI engineers introduced Translatotron which is an finish to finish, speech to speech translation mannequin.

Translatotron proves {that a} single sequence-to-sequence AI mannequin can instantly translate speech from one language into one other. Of their research paper, the crew demonstrated the brand new speech translation mannequin and efficiently obtained excessive translation high quality on two Spanish-to-English datasets.

Additionally Learn: High 3 Main Limitations of Synthetic Intelligence (AI)

Google AI introduces Translatotron — The mannequin structure of Translatotron

If we go a bit deeper, speech-to-speech translation programs normally consists of three elements:

Speech Recognition: It used to transform the supply speech into textual content.
Machine Translation: It’s used for translating the transformed textual content into the goal language.
Textual content-to-Speech Synthesis (TTS): It’s used to provide speech within the goal language from the translated textual content.

There are lots of profitable speech-to-speech translation merchandise corresponding to Google Translate powered by such programs.

Google engineers have been engaged on this challenge for nearly three years. The story began in 2016 when researchers demonstrated the practicability of utilizing a single sequence-to-sequence mannequin for speech-to-text translation. It additionally made researchers realized the necessity for end-to-end speech translation fashions

Later, in 2017, the Google AI crew confirmed that such these fashions can outperform the traditional cascade fashions. Not solely Google, however lately many different proposals have additionally been made for enhancing end-to-end speech-to-text translation fashions.

Not like cascaded programs, Translatotron doesn’t depend on an intermediate textual content illustration in both language. It’s based mostly on a sequence-to-sequence community that takes supply spectrograms as enter after which generates spectrograms of the translated textual content within the goal language.

Popular Post

The Best AI-Powered SEO Content Software to Improve Your Rankings

Debunking AI & RPA Myths in Insurance

Neuralink Rival’s Biohybrid Implant Connects to the Brain With Living Neurons

AI Breakthroughs in Endoscopy – Unite.AI

The Tech World Is ‘Disrupting’ Book Publishing. But Do We Want Effortless Art?

Subscribe

An end-to-end speech translation model

You may also like

Popular Post

Subscribe