مرکز منطقه ای اطلاع رساني علوم و فناوري - Discriminative Input Stream Combination for Conditional Random Field Phone Recognition

DocumentCode :

812630

Title :

Discriminative Input Stream Combination for Conditional Random Field Phone Recognition

Author :

Heintz, Ilana ; Fosler-Lussier, Eric ; Brew, Chris

Author_Institution :

Dept. of Linguistics, Ohio State Univ., Columbus, OH, USA

Volume :

Issue :

fYear :

2009

Firstpage :

1533

Lastpage :

1546

Abstract :

In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification and recognition tasks by combining non-Gaussian distributed representations of acoustic input. In previous work by I. Heintz (latent phonetic analysis: Use of singular value decomposition to determine features for CRF phone recognition, Proc. ICASSP, pp. 4541-4544, 2008), we experimented with combining phonological feature posterior estimators and phone posterior estimators within a CRF framework; we found that treating posterior estimates as terms in a ldquophoneme information retrievalrdquo task allowed for a more effective use of multiple posterior streams than directly feeding these acoustic representations to the CRF recognizer. In this paper, we examine some of the design choices in our previous work, and extend our results to up to six acoustic feature streams. We concentrate on feature design, rather than feature selection, to find the best way of combining features for introduction into a log-linear model. We improve upon our previous work to find that several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed by a nonlinear transform provided by a multilayer perceptron, provides a significant gain in phone recognition accuracy on the TIMIT task.

Keywords :

acoustic signal processing; feature extraction; hidden Markov models; multilayer perceptrons; random processes; signal classification; signal representation; singular value decomposition; speech recognition; statistical distributions; transforms; CRF phone recognition; HMM; KLT technique; PARAFAC2 technique; SVD technique; TIMIT task; acoustic feature representation; automatic speech recognition; conditional random field phone recognition; dimensionality reduction technique; discriminative input stream combination; feature selection; latent phonetic analysis; log-linear model; multilayer perceptron; nonGaussian distribution; nonlinear transform; phone classification; phone posterior estimator; phoneme information retrieval; phonological feature posterior estimator; singular value decomposition; Matrix decomposition; multilayer perceptrons; speech recognition; stochastic fields;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2022204

Filename :

4909058

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=812630