Title :
The purpose, history, current state, and some evolving trends in feature extraction for speech recognition
Author :
Hermansky, Professor H.
Author_Institution :
Graduate Inst. of Sci. & Technol., Portland, OR, USA
Abstract :
Summary form only given, as follows. Firstly, the basic principles of automatic recognition of speech are reviewed. The acoustic analysis module is focused on in greater detail and distinctions between its two main blocks, the pattern classification and the feature extraction, are discussed. The early history of speech feature extraction mentions early attempts of Newton and Helmholtz to characterize information bearing components of vowels, and Scripture´s analysis of phonographic voice recordings. The concept of short-term analysis and spectrograms is introduced together with the linear model of speech production. Reasons for spectral envelope estimation in ASR as well as basic techniques for its estimation such as homomorphic analysis and linear predictive analysis are introduced. Cepstrum as an approximation to Karhunen-Loeve transformation and cepstral lifters as means for modifying properties of simple Euclidean cepstral distances are also introduced. Inconsistencies of simple envelope estimation techniques with human speech perception are mentioned. Reasons for auditory-like feature extraction and some currently dominant auditory-like techniques such as Mel cepstral analysis and perceptual linear prediction (PLP) are described. The concept and basic properties of a modulation spectrum of speech is explained and its historical use in predicting intelligibility of speech in auditoria is mentioned. Dynamic features (delta, double-delta) are discussed, with a special focus on their interpretation as FIR filters applied to modulation spectrum of speech. RASTA filtering is introduced as an extension of FIR filtering done in dynamic feature estimation and reasons for its robustness to changes in communication environments explained. Interesting consistencies of RASTA processing with temporal properties of human hearing such as forward masking is also mentioned. The need for data-driven feature extraction is discussed and techniques for design of discriminant spectral basis and of discriminant RASTA filters are described with recent results of their applications in automatic recognition of speech and in speaker recognition. The concept of multi-band recognition of speech is introduced and its inherent robustness in presence of colored noise is discussed. The concept is further generalized into more general sub-stream based recognition and some techniques for merging of information sub-streams are described. Finally, recently introduced speech recognition from temporal patterns of spectral energies is described, and its inherent advantages in recognition of speech in adverse environments discussed
Keywords :
FIR filters; acoustic noise; cepstral analysis; estimation theory; feature extraction; history; prediction theory; speaker recognition; speech recognition; FIR filters; Karhunen-Loeve transformation; Mel cepstral analysis; RASTA filtering; RASTA processing; acoustic analysis module; auditory-like feature extraction; auditory-like techniques; cepstral lifters; cepstrum; colored noise; communication environments; data-driven feature extraction; dynamic features; feature extraction; forward masking; history; homomorphic analysis; human hearing; human speech perception; intelligibility; linear model; linear predictive analysis; modulation spectrum; multi-band recognition; pattern classification; perceptual linear prediction; phonographic voice recordings; short-term analysis; simple Euclidean cepstral distances; speaker recognition; spectral envelope estimation; spectrograms; speech production; speech recognition; temporal properties; vowels; Automatic speech recognition; Cepstral analysis; Feature extraction; Filtering; Finite impulse response filter; History; Humans; Pattern analysis; Speech analysis; Speech recognition;
Conference_Titel :
Signal Processing and Its Applications, 1999. ISSPA '99. Proceedings of the Fifth International Symposium on
Conference_Location :
Brisbane, Qld.
Print_ISBN :
1-86435-451-8
DOI :
10.1109/ISSPA.1999.818095