DocumentCode :
52197
Title :
Temporally Varying Weight Regression: A Semi-Parametric Trajectory Model for Automatic Speech Recognition
Author :
Shilin Liu ; Khe Chai Sim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Volume :
22
Issue :
1
fYear :
2014
fDate :
Jan. 2014
Firstpage :
151
Lastpage :
160
Abstract :
Standard Hidden Markov Model (HMM) assumes that successive observations are independent to one another given the state sequence. This leads to a poor trajectory model for speech. Many explicit trajectory modeling techniques have been studied in the past to improve trajectory modeling for HMM. However, these techniques do not yield promising improvements over conventional HMM systems where differential parameters and Gaussian Mixture Model have been used implicitly to circumvent the poor trajectory modeling issue of HMM. Recently, semi-parametric trajectory modeling techniques based on temporally varying model parameters such as fMPE and pMPE have been shown to yield promising improvements over state-of-the-art systems on large vocabulary continuous speech recognition tasks. These techniques use high dimensional posterior features derived from a long span of acoustic features to model temporally varying attributes of the speech signal. Bases corresponding to these posterior features are then discriminatively estimated to yield temporally varying mean (fMPE) and precision matrix (pMPE) parameters. Motivated by the success of fMPE and pMPE, Temporally Varying Weight Regression (TVWR) was recently proposed to model HMM trajectory implicitly using time-varying Gaussian weights. In this paper, a complete formulation of TVWR is given based on a probabilistic modeling framework. Parameter estimation formulae using both maximum likelihood (ML) and minimum phone error (MPE) criteria are derived. Experimental results based on the Wall Street Journal ( CSR-WSJ0 + WSJ1) and Aurora 4 corpora show that consistent promising improvements over the standard HMM systems can be obtained in both the 20 k open vocabulary recognition task (NIST Nov´92 WSJ0) and 5 k closed vocabulary noisy speech recognition for both ML and MPE criteria.
Keywords :
Gaussian processes; acoustic signal processing; feature extraction; hidden Markov models; matrix algebra; maximum likelihood estimation; regression analysis; speech processing; speech recognition; vocabulary; Gaussian mixture model; HMM; ML estimation; MPE criteria; TVWR; acoustic feature estimation; automatic speech recognition; continuous speech recognition; fMPE; hidden Markov model; maximum likelihood estimation; minimum phone error; pMPE; parameter estimation; posterior feature estimation; precision matrix; probabilistic modeling framework; semiparametric trajectory model; speech signal processing; state sequence; temporal varying attribute; temporal varying weight regression; temporally varying mean; temporally varying model parameter; time-varying Gaussian weight; trajectory modeling technique; vocabulary recognition; Acoustics; Hidden Markov models; Maximum likelihood estimation; Speech; Speech recognition; Trajectory; Vocabulary; Acoustic modeling; hidden Markov model; pattern recognition; trajectory modeling;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2013.2285487
Filename :
6633086
Link To Document :
بازگشت