The neural network is taught to almost perfectly replicate the human voice

Date:

2017-10-07 11:30:06

Views:

1498

Rating:

1Like 0Dislike

Share:

The neural network is taught to almost perfectly replicate the human voice

Last year, the company DeepMind engaged in the development of artificial intelligence technology, shared details about his new project WaveNet neural networks deep learning that can be used to sintetici realistic human speech. Recently was released an upgraded version of this technology that will be used as the basis of the digital mobile assistant Google Assistant.

A System of voice synthesis (also known as conversion function "text-to-speech" text-to-speech, TTS) is usually built on the basis of one of two basic methods. Concatenative (or composite) method involves the construction of phrases through the collection of separate pieces of recorded words and parts pre-recorded with the involvement of the actor dubbing. The main disadvantage of this method is the need for constant replacement sound library every time, when there are any updates or changes.

Another method is called the parametric TTS, and its feature is the use of sets of parameters by which the computer generates the desired phrase. Minus the method that is most often the result manifests itself in the form of so-called unrealistic or robotic sound.

As for WaveNet, it produces sound waves from scratch based on the system based on convolutional neural networks, where sound generation happens in several layers. First for training platform centenarii "live" speech, her "feed" a huge amount of samples, thus noting which audible signals sound realistic and which are not. It gives a voice synthesizer reproduce naturalistic intonation, and even such details as the sounds of smacking lips. Depending on which samples are run through a speech system, this allows her to develop a unique "accent" that could eventually be used to create many different voices.

the

Sharp tongue

Perhaps the biggest limitation of the WaveNet system was that it required a huge amount of computing power, and even in this condition it was not different speed. For example, for generation of 0.02 seconds of sound she had about 1 second of time.

After a year working DeepMind engineers still found a way to improve and optimize the system so that it is now able to produce a raw sound with a duration of one second using only 50 milliseconds, which is 1000 times faster than its original capacity. Moreover, the experts managed to increase the audio sampling rate with 8-bit to 16-bit, which has a positive impact on the tests with the involvement of the audience. Thanks to these successes, WaveNet opened the road for integration into such consumer products as Google Assistant.

Currently, WaveNet can be used to generate English and Japanese voices via Google Assistant and all platforms that use the digital assistant. Because the system can create a special type of votes depending on which set of samples was provided for learning, then soon Google will most likely implement in WaveNet support centenarii realistic speech and other tongues, including with regard to their local dialects.

Speech interfaces are becoming more and more common on a variety of platforms, but their distinct unnatural nature sound repels many potential users. Attempts company DeepMind to improve this technology will certainly contribute to a broader dissemination of these voice systems, and will also improve user experience from their use.

Examples of English and Japanese synthesized speech using neural network, WaveNet can be found .

Recommended

The Oculus zuest 2 virtual reality helmet for $300. What's he capable of?

The Oculus zuest 2 virtual reality helmet for $300. What's he capable of?

Why is the new Oculus zuest 2 better than the old model? Let's work it out together. About a decade ago, major technology manufacturers introduced the first virtual reality helmets that were available to ordinary users. There were two ways to find yo...

The mysteries of neurotechnology - can the brain be used as a weapon?

The mysteries of neurotechnology - can the brain be used as a weapon?

DARPA has launched the development of a neural engineering system to research a technology that can turn soldiers into cyborgs Despite the fact that the first representatives of the species Homo Sapiens appeared on Earth about 300,000 - 200,000 years...

What materials can be used to build houses on Mars?

What materials can be used to build houses on Mars?

Marsha constructions n the surface of the Red Planet SpaceX CEO Elon Musk is hopeful that humans will go to Mars in the next ten years. Adapted for long flight ship Starship is already in development, but scientists have not yet decided where exactly...

Comments (0)

This article has no comment, be the first!

Add comment

Related News

Hybrid planes Zunum reduce the cost of flights will be unmanned

Hybrid planes Zunum reduce the cost of flights will be unmanned

American startup Zunum is working to create a hybrid electrochemica for several years and during that time managed to achieve significant . Despite the fact that all his projects are still in development, he has managed to enlist ...

In the world of tomorrow not only you can watch movies, but they are for you

In the world of tomorrow not only you can watch movies, but they are for you

When you are in a dark movie theater, your reaction to what is happening on the screen often go unnoticed by others. Here you wide open the eyes in case of an unexpected plot twist, literally Bouncing in my chair from scary scenes...

The neural network is taught to experience emotions

The neural network is taught to experience emotions

In the ongoing conferences in Moscow «Neuroinformatics-2017» developments in the field of neuroscience, particular attention is paid to work on the creation of artificial intelligence. But solely by the lectures is not l...