An end-to-end speech translation model

Google is actively integrating Synthetic Intelligence to its merchandise nowadays. Just lately, Google AI engineers introduced Translatotron which is an finish to finish, speech to speech translation mannequin.

Translatotron proves {that a} single sequence-to-sequence AI mannequin can instantly translate speech from one language into one other. Of their research paper, the crew demonstrated the brand new speech translation mannequin and efficiently obtained excessive translation high quality on two Spanish-to-English datasets.

Additionally Learn: High 3 Main Limitations of Synthetic Intelligence (AI)

Google AI introduces Translatotron — The mannequin structure of Translatotron

If we go a bit deeper, speech-to-speech translation programs normally consists of three elements:

Speech Recognition: It used to transform the supply speech into textual content.
Machine Translation: It’s used for translating the transformed textual content into the goal language.
Textual content-to-Speech Synthesis (TTS): It’s used to provide speech within the goal language from the translated textual content.

There are lots of profitable speech-to-speech translation merchandise corresponding to Google Translate powered by such programs.

Google engineers have been engaged on this challenge for nearly three years. The story began in 2016 when researchers demonstrated the practicability of utilizing a single sequence-to-sequence mannequin for speech-to-text translation. It additionally made researchers realized the necessity for end-to-end speech translation fashions

Later, in 2017, the Google AI crew confirmed that such these fashions can outperform the traditional cascade fashions. Not solely Google, however lately many different proposals have additionally been made for enhancing end-to-end speech-to-text translation fashions.

Not like cascaded programs, Translatotron doesn’t depend on an intermediate textual content illustration in both language. It’s based mostly on a sequence-to-sequence community that takes supply spectrograms as enter after which generates spectrograms of the translated textual content within the goal language.

The brand new end-to-end speech translation mannequin works on two individually educated elements:

Neural vocoder: It converts output spectrograms to time-domain waveforms.
Speaker encoder: It maintains the supply speaker’s voice within the synthesized translated speech.

The Google AI engineers validated Translatotron’s translation high quality by measuring the BLEU (bilingual analysis understudy) rating, computed with textual content transformed by a speech recognition system. The outcomes would possibly lag behind a conventional cascade system however the crew has managed to display the usefulness of the end-to-end direct speech-to-speech translation.

Additionally Learn: Google Launches AI Platform For Builders and Knowledge Scientists

Translatotron retains the unique vocal traits within the translated speech by together with a speaker encoder community and makes the translated speech sound pure.

The engineers concluded that Translatotron is the primary end-to-end mannequin that may instantly translate speech from one language into speech in one other language and may retain the supply voice within the translated speech. They’re contemplating this as a place to begin for future analysis on end-to-end speech-to-speech translation programs.

Source link

Popular Post

Poetry by History’s Greatest Poets or AI? People Can’t Tell the Difference—and Even Prefer the Latter. What Gives?

A ChatGPT-Like AI Can Now Design Whole New Genomes From Scratch

How Data Science and Machine Learning Certifications Enhance Job Prospects?

AI & RPA in Healthcare- Trends, Use Cases & Benefits

MIT’s New Robot Dog Learned to Walk and Climb in a Simulation Whipped Up by Generative AI

Subscribe

An end-to-end speech translation model

You may also like

Popular Post

Subscribe