Microsoft has done the research work and the researchers have reached to the new milestone to enhance the understanding of today computer. Now the computers are getting better at the understanding words which we speak. About two decades ago, the potential for mistaking of words was gone down from 43% to 6.3%. Now this figure has gone down with the provision of variety of players now the latest innovation of Microsoft in speech recognition has narrowed the gap.
Neural Networks Hold The Key To Speech Recognition
Microsoft and IBM both have started the deep neural networks because of the advancements in the speech recognition technologies. The deep neural networks are made after taking the inspiration from the biological process of the human brain and they use it in the software form to help the computers in understanding the speech in better way.
The chief speech research scientist of Microsoft, Xuedong Huang reported that with the use of neural networks, they gained the Word Error Rate (WER) of 6.3%. The WER was achieved in industry standard Switchboard Speech recognition task, where WER of Microsoft was at the lowest form, if it is compared to other speech recognition systems.
The international conference on speech communication and technology, Interspeech was conducted in San Francisco in which IBM mentioned that it gained the WER of 6.9%. it is mentioned that WER was as high as 43% about two decades ago.
How Microsoft Managed to Achieve This
As the neural networks are built on various layers and Microsoft has done the research work and the team of Microsoft won ImageNet Computer vision challenge for the deep residual neural network to utilize the new cross-network layering system.
This is prepared with Computational Network Toolkit with the reason to make advancement in speech recognition systems for Microsoft. It allows the neural network algorithms to run the magnitude with the quicker pace than it can normally do. There is another reason to use the GPUs (Graphical Processing Units or Graphic cards in layman terms).
When you do the parallel processing, then GPUs are very good and it allows the deep neural network algorithms to run it more efficiently. Cortana, the voice assistant of Microsoft has shown that it can consume 10 times more speech data with the provision of GPUs and CNTK.