مرکز منطقه ای اطلاع رساني علوم و فناوري - A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks

DocumentCode :

38874

Title :

A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks

Author :

Bo Li ; Khe Chai Sim

Author_Institution :

Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore

Volume :

Issue :

fYear :

2014

fDate :

Aug. 2014

Firstpage :

1296

Lastpage :

1305

Abstract :

Improving the noise robustness of automatic speech recognition systems has been a challenging task for many years. Recently, it was found that Deep Neural Networks (DNNs) yield large performance gains over conventional GMM-HMM systems, when used in both hybrid and tandem systems. However, they are still far from the level of human expectations especially under adverse environments. Motivated by the separation-prior-to-recognition process of the human auditory system, we propose a robust spectral masking system where power spectral domain masks are predicted using a DNN trained on the same filter-bank features used for acoustic modeling. To further improve performance, Linear Input Network (LIN) adaptation is applied to both the mask estimator and the acoustic model DNNs. Since the estimation of LINs for the mask estimator requires stereo data, which is not available during testing, we proposed using the LINs estimated for the acoustic model DNNs to adapt the mask estimators. Furthermore, we used the same set of weights obtained from pre-training for the input layers of both the mask estimator and the acoustic model DNNs to ensure a better consistency for sharing LINs. Experimental results on benchmark Aurora2 and Aurora4 tasks demonstrated the effectiveness of our system, which yielded Word Error Rates (WERs) of 4.6% and 11.8% respectively. Furthermore, the simple averaging of posteriors from systems with and without spectral masking can further reduce the WERs to 4.3% on Aurora2 and 11.4% on Aurora4.

Keywords :

channel bank filters; neural nets; speech recognition; Aurora2 task; Aurora4 task; GMM-HMM systems; acoustic modeling; automatic speech recognition systems; deep neural networks; filter-bank features; human auditory system; hybrid system; linear input network adaptation; mask estimator; noise robustness; noise-robust speech recognition; power spectral domain masks; separation-prior-to-recognition process; spectral masking approach; stereo data; tandem system; word error rates; Acoustics; Adaptation models; Estimation; IEEE transactions; Noise; Speech; Speech processing; Deep neural network; noise robustness; spectral masking;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2014.2329237

Filename :

6826528

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=38874