DocumentCode
700028
Title
Spectro-temporal features for Automatic Speech Recognition using Linear Prediction in spectral domain
Author
Thomas, Samuel ; Ganapathy, Sriram ; Hermansky, Hynek
Author_Institution
IDIAP Res. Inst., Martigny, Switzerland
fYear
2008
fDate
25-29 Aug. 2008
Firstpage
1
Lastpage
4
Abstract
Frequency Domain Linear Prediction (FDLP) provides an efficient way to represent temporal envelopes of a signal using auto-regressive models. For the input speech signal, we use FDLP to estimate temporal trajectories of sub-band energy by applying linear prediction on the cosine transform of sub-band signals. The sub-band FDLP envelopes are used to extract spectral and temporal features for speech recognition. The spectral features are derived by integrating the temporal envelopes in short-term frames and the temporal features are formed by converting these envelopes into modulation frequency components. These features are then combined in the phoneme posterior level and used as the input features for a hybrid HMM-ANN based phoneme recognizer. The proposed spectro-temporal features provide a phoneme recognition accuracy of 69.1% (an improvement of 4.8% over the Perceptual Linear Prediction (PLP) base-line) for the TIMIT database.
Keywords
autoregressive processes; feature extraction; hidden Markov models; neural nets; signal representation; speech recognition; TIMIT database; auto-regressive models; automatic speech recognition; cosine transform; frequency domain linear prediction; hybrid HMM-ANN based phoneme recognizer; perceptual linear prediction; phoneme recognition; signal representation; spectral domain; spectral feature extraction; spectro-temporal features; speech signal; temporal feature extraction; Abstracts; Corporate acquisitions; Databases; Discrete cosine transforms; Neurons; Speech; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2008 16th European
Conference_Location
Lausanne
ISSN
2219-5491
Type
conf
Filename
7080560
Link To Document