Title :
Acoustic Factor Analysis for Streamed Hidden Markov Modeling
Author :
Chien, Jen-Tzung ; Ting, Chuan-Wei
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Abstract :
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.
Keywords :
Gaussian processes; acoustic signal processing; cepstral analysis; correlation methods; covariance matrices; decoding; feature extraction; hidden Markov models; speech coding; speech recognition; FASHMM speech recognition; acoustic factor analysis principle; cepstral feature correlation extraction; covariance matrix; decoding algorithm; multiple Markov chain; multivariate Gaussian mixture; streamed hidden Markov modeling framework; Acoustic noise; Cepstral analysis; Decoding; Hidden Markov models; Speech enhancement; Speech processing; Speech recognition; Standards development; Streaming media; Topology; Factor analysis (FA); Markov chain; speech recognition; streamed hidden Markov model;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2014891