DocumentCode :
661450
Title :
Modulation spectrum power-law expansion for robust speech recognition
Author :
Hao-teng Fan ; Zi-Hao Ye ; Jeih-weih Hung
Author_Institution :
Dept. of Electr. Eng., Nat. Chi Nan Univ., Nantou, Taiwan
fYear :
2013
fDate :
Oct. 29 2013-Nov. 1 2013
Firstpage :
1
Lastpage :
5
Abstract :
In this paper, we present a novel approach to enhancing the speech features in the modulation spectrum for better recognition performance in noise-corrupted environments. In the presented approach, termed modulation spectrum power-law expansion (MSPLE), the speech feature temporal stream is first pre-processed by some statistics compensation technique, such as mean and variance normalization (MVN), cepstral gain normalization (CGN) and MVN plus ARMA filtering (MVA), and then the magnitude part of the modulation spectrum (Fourier transform) for the feature stream is raised to a power (exponentiated). We find that MSPLE can highlight the speech components and reduce the noise distortion existing in the statistics-compensated speech features. With the Aurora-2 digit database task, experimental results reveal that the above process can consistently achieve very promising recognition accuracy under a wide range of noise-corrupted environments. MSPLE operated on MVN-preprocessed features brings about 55% in error rate reduction relative to the MFCC baseline and significantly outperforms the single MVN. Furthermore, performing MSPLE on the lower sub-band modulation spectra gives the results very close to those from the full-band modulation spectra updated by MSPLE, indicating that a less-complicated MSPLE suffices to produce noise-robust speech features.
Keywords :
Fourier transforms; filtering theory; speech recognition; statistical analysis; ARMA filtering; CGN; Fourier transform; MSPLE; MVN; cepstral gain normalization; feature stream; full band modulation spectra; mean and variance normalization; modulation spectrum power law expansion; noise corrupted environments; noise distortion; robust speech recognition; speech components; speech feature temporal stream; speech features; statistics compensation technique; subband modulation spectra; Accuracy; Frequency modulation; Mel frequency cepstral coefficient; Noise robustness; Speech; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location :
Kaohsiung
Type :
conf
DOI :
10.1109/APSIPA.2013.6694312
Filename :
6694312
Link To Document :
بازگشت