Title :
Improvements to filterbank and delta learning within a deep neural network framework
Author :
Sainath, Tara N. ; Kingsbury, Brian ; Mohamed, Abdel-rahman ; Saon, George ; Ramabhadran, Bhuvana
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
Keywords :
channel bank filters; learning (artificial intelligence); neural nets; speaker recognition; speech recognition; CNN; English Broadcast News task; VTLN warping; WER; deep neural network; delta learning; filter learning; perceptually motivated mel-filter bank; speaker adaptation; speaker adaptation techniques; speaker identity features; speech recognition; word error rate; Acoustics; Equations; Hidden Markov models; Mathematical model; Neural networks; Speech; Speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854925