Google DeepMind’s WaveNet Produces Natural Human Sound And Music

Google’s DeepMind engine takes things to a whole new level. However, new software called WaveNet, from the brainiacs at DeepMind, is setting a high watermark in the field of speech synthesis and giving AI a voice eerily similar to that of a human.

The people behind Google’s DeepMind machine learning experiment say they’ve come up with a completely new way of synthesising artificial speech, binning the traditional piecing together techniques in favour of a method that uses real waveforms to create sounds. The result was an algorithm that could understand the way sounds follow each other on different timescales during speech in English and Mandarin.

Existing text-to-speech (TTS) systems tend to use a system called concatenative TTS, where the audio is generated by recombining fragments of recorded speech. Building up these samples from a wider range of human voices makes the result more realistic. The drawback is that the sound of the voice can not be easily modified.

Other systems form a voice electronically, based on rules about how letter combinations are pronounced.

However, even when the DeepMind team created the human-alike voice system, such technology is still in a beta phase and is not practical in real-life devices. Then they put it up against a parametric system that uses a hidden Markov model (HMM) and a concatenative system that relies on a long short-term memory recurrent neural network (LSTM-RNN) – both relying on the same training data.

What’s different about WaveNet is that its can directly model the raw waveform of an audio signal, an extremely complicated task that required a novel neural network. In the last couple of years, we’ve seen computers improve on how well they can understand human speech. It is for this reason why WaveNet is being closely watched by tech companies, according to Bloomberg. These work by employing a huge library of pre-recorded human sounds and phonemes with altering emphasis and emotions.

Dubbed WaveNet, the AI promises significant improvements to computer-generated speech, and could eventually be used in digital personal assistants such as Siri, Cortana and Amazon’s Alexa.

Google’s DeepMind company, which created the AlphaGO program that beat a human world champion at the ancient Chinese game of Go, has now made a breakthrough in speech generation for machines. Since then, Google showed that DeepMind can be used to cut the cooling bill for one of its data centers by 40%. It has also said that DeepMind has helped achieve “substantial improvements to a set of services from YouTube and Google Play to Google’s advertising products”.

Advertisement

Advertisement

About the Author

Some Related Posts