Title :
Discriminative Input Stream Combination for Conditional Random Field Phone Recognition
Author :
Heintz, Ilana ; Fosler-Lussier, Eric ; Brew, Chris
Author_Institution :
Dept. of Linguistics, Ohio State Univ., Columbus, OH, USA
Abstract :
In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification and recognition tasks by combining non-Gaussian distributed representations of acoustic input. In previous work by I. Heintz (latent phonetic analysis: Use of singular value decomposition to determine features for CRF phone recognition, Proc. ICASSP, pp. 4541-4544, 2008), we experimented with combining phonological feature posterior estimators and phone posterior estimators within a CRF framework; we found that treating posterior estimates as terms in a ldquophoneme information retrievalrdquo task allowed for a more effective use of multiple posterior streams than directly feeding these acoustic representations to the CRF recognizer. In this paper, we examine some of the design choices in our previous work, and extend our results to up to six acoustic feature streams. We concentrate on feature design, rather than feature selection, to find the best way of combining features for introduction into a log-linear model. We improve upon our previous work to find that several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed by a nonlinear transform provided by a multilayer perceptron, provides a significant gain in phone recognition accuracy on the TIMIT task.
Keywords :
acoustic signal processing; feature extraction; hidden Markov models; multilayer perceptrons; random processes; signal classification; signal representation; singular value decomposition; speech recognition; statistical distributions; transforms; CRF phone recognition; HMM; KLT technique; PARAFAC2 technique; SVD technique; TIMIT task; acoustic feature representation; automatic speech recognition; conditional random field phone recognition; dimensionality reduction technique; discriminative input stream combination; feature selection; latent phonetic analysis; log-linear model; multilayer perceptron; nonGaussian distribution; nonlinear transform; phone classification; phone posterior estimator; phoneme information retrieval; phonological feature posterior estimator; singular value decomposition; Matrix decomposition; multilayer perceptrons; speech recognition; stochastic fields;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2022204