مرکز منطقه ای اطلاع رساني علوم و فناوري - Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

DocumentCode :

3161979

Title :

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

Author :

Mitra, Vikramjit ; Franco, Horacio ; Graciarena, Martin ; Mandal, Arindam

Author_Institution :

Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4117

Lastpage :

4120

Abstract :

Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager´s nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI´s DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.

Keywords :

amplitude modulation; noise; speech recognition; vocabulary; Aurora-2 digit recognition task; NMCC features; SRI DECIPHER large vocabulary speech recognition system; WSJ word-recognition experiments; automatic speech recognition systems; background noise; channel degradations; human auditory system; human speech recognition performance; large vocabulary noise-robust speech recognition; normalized amplitude modulation feature; normalized modulation cepstral coefficient features; renoised WSJ corpus; renoised wall street journal corpus; standard HTK recognizer; teager nonlinear energy operator; Hidden Markov models; Noise; Noise robustness; Robustness; Speech; Speech recognition; Large Vocabulary Speech Recognition; Modulation Features; Noise-Robust Speech Recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288824

Filename :

6288824

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3161979