Title :
Speech feature analysis using variational Bayesian PCA
Author :
Kwon, Oh-Wook ; Chan, Kwokleung ; Lee, Te-Won
Author_Institution :
Inst. for Neural Comput., Univ. of California, La Jolla, CA, USA
fDate :
5/1/2003 12:00:00 AM
Abstract :
In most hidden Markov model-based automatic speech recognition systems, one of the fundamental questions is to determine the intrinsic speech feature dimensionality and the number of clusters used on the Gaussian mixture model. We analyzed mel-frequency band energies using a variational Bayesian principal component analysis method to estimate the feature dimensionality as well as the number of Gaussian mixtures by learning a maximum lower bound of the evidence instead of maximizing the likelihood function as used in conventional speech recognition systems. In analyzing the Texas Instruments/Massachusetts Institute of Technology (TIMIT) speech database, our method revealed the intrinsic structures of vowels and consonants. The usefulness of this method is demonstrated in the superior classification performance for the most difficult phonemes /b/, /d/, and /g/.
Keywords :
Bayes methods; Gaussian processes; feature extraction; hidden Markov models; principal component analysis; speech processing; speech recognition; Gaussian mixture model; Gaussian mixtures; HMM-based automatic speech recognition systems; Massachusetts Institute of Technology; TIMIT speech database; Texas Instruments; classification performance; consonants structure; evidence; feature dimensionality estimation; hidden Markov model; maximum lower bound; mel-frequency band energies; phonemes; principal component analysis; speech feature analysis; speech feature dimensionality; variational Bayesian PCA; vowels structure; Associate members; Bayesian methods; Cepstral analysis; Hidden Markov models; Instruments; Principal component analysis; Probability distribution; Spatial databases; Speech analysis; Speech recognition;
Journal_Title :
Signal Processing Letters, IEEE
DOI :
10.1109/LSP.2003.810017