مرکز منطقه ای اطلاع رساني علوم و فناوري - Modulation spectrum power-law expansion for robust speech recognition

DocumentCode :

661450

Title :

Modulation spectrum power-law expansion for robust speech recognition

Author :

Hao-teng Fan ; Zi-Hao Ye ; Jeih-weih Hung

Author_Institution :

Dept. of Electr. Eng., Nat. Chi Nan Univ., Nantou, Taiwan

fYear :

2013

fDate :

Oct. 29 2013-Nov. 1 2013

Firstpage :

Lastpage :

Abstract :

In this paper, we present a novel approach to enhancing the speech features in the modulation spectrum for better recognition performance in noise-corrupted environments. In the presented approach, termed modulation spectrum power-law expansion (MSPLE), the speech feature temporal stream is first pre-processed by some statistics compensation technique, such as mean and variance normalization (MVN), cepstral gain normalization (CGN) and MVN plus ARMA filtering (MVA), and then the magnitude part of the modulation spectrum (Fourier transform) for the feature stream is raised to a power (exponentiated). We find that MSPLE can highlight the speech components and reduce the noise distortion existing in the statistics-compensated speech features. With the Aurora-2 digit database task, experimental results reveal that the above process can consistently achieve very promising recognition accuracy under a wide range of noise-corrupted environments. MSPLE operated on MVN-preprocessed features brings about 55% in error rate reduction relative to the MFCC baseline and significantly outperforms the single MVN. Furthermore, performing MSPLE on the lower sub-band modulation spectra gives the results very close to those from the full-band modulation spectra updated by MSPLE, indicating that a less-complicated MSPLE suffices to produce noise-robust speech features.

Keywords :

Fourier transforms; filtering theory; speech recognition; statistical analysis; ARMA filtering; CGN; Fourier transform; MSPLE; MVN; cepstral gain normalization; feature stream; full band modulation spectra; mean and variance normalization; modulation spectrum power law expansion; noise corrupted environments; noise distortion; robust speech recognition; speech components; speech feature temporal stream; speech features; statistics compensation technique; subband modulation spectra; Accuracy; Frequency modulation; Mel frequency cepstral coefficient; Noise robustness; Speech; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific

Conference_Location :

Kaohsiung

Type :

conf

DOI :

10.1109/APSIPA.2013.6694312

Filename :

6694312

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=661450