Title :
Enhanced power-normalized features for mandarin robust speech recognition based on a voiced-unvoiced-silence decision
Author :
Ying-Wei Tan ; Wen-Ju Liu ; Zhan-Lei Yang ; Ming-ming Chen
Author_Institution :
Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Abstract :
Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced power-normalized features are obtained and used to process noise-corrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).
Keywords :
decision theory; speech processing; speech recognition; AFE; ETSI advanced front-end; English large vocabulary continuous speech recognition; FD algorithm; Mandarin continuous speech recognition; Mandarin robust speech recognition; SPNP; WHNM; acoustic conditions; enhanced power-normalized features; harmonic enhancement; noise-corrupted Mandarin speech processing; nonstationary noises; quality frame dropping algorithm; salient harmonic information; standard power-normalized processing; voiced sounds; voiced-unvoiced-silence decision; weighted harmonic-noise-model; Accuracy; Noise; Noise measurement; Robustness; Speech; Speech recognition; Telecommunication standards; Mandarin robust speech recognition; a voiced-unvoiced-silence decision; a weighted harmonic-noise-model; enhanced power-normalized features;
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4799-5401-8
DOI :
10.1109/ChinaSIP.2014.6889236