DocumentCode
52197
Title
Temporally Varying Weight Regression: A Semi-Parametric Trajectory Model for Automatic Speech Recognition
Author
Shilin Liu ; Khe Chai Sim
Author_Institution
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Volume
22
Issue
1
fYear
2014
fDate
Jan. 2014
Firstpage
151
Lastpage
160
Abstract
Standard Hidden Markov Model (HMM) assumes that successive observations are independent to one another given the state sequence. This leads to a poor trajectory model for speech. Many explicit trajectory modeling techniques have been studied in the past to improve trajectory modeling for HMM. However, these techniques do not yield promising improvements over conventional HMM systems where differential parameters and Gaussian Mixture Model have been used implicitly to circumvent the poor trajectory modeling issue of HMM. Recently, semi-parametric trajectory modeling techniques based on temporally varying model parameters such as fMPE and pMPE have been shown to yield promising improvements over state-of-the-art systems on large vocabulary continuous speech recognition tasks. These techniques use high dimensional posterior features derived from a long span of acoustic features to model temporally varying attributes of the speech signal. Bases corresponding to these posterior features are then discriminatively estimated to yield temporally varying mean (fMPE) and precision matrix (pMPE) parameters. Motivated by the success of fMPE and pMPE, Temporally Varying Weight Regression (TVWR) was recently proposed to model HMM trajectory implicitly using time-varying Gaussian weights. In this paper, a complete formulation of TVWR is given based on a probabilistic modeling framework. Parameter estimation formulae using both maximum likelihood (ML) and minimum phone error (MPE) criteria are derived. Experimental results based on the Wall Street Journal ( CSR-WSJ0 + WSJ1) and Aurora 4 corpora show that consistent promising improvements over the standard HMM systems can be obtained in both the 20 k open vocabulary recognition task (NIST Nov´92 WSJ0) and 5 k closed vocabulary noisy speech recognition for both ML and MPE criteria.
Keywords
Gaussian processes; acoustic signal processing; feature extraction; hidden Markov models; matrix algebra; maximum likelihood estimation; regression analysis; speech processing; speech recognition; vocabulary; Gaussian mixture model; HMM; ML estimation; MPE criteria; TVWR; acoustic feature estimation; automatic speech recognition; continuous speech recognition; fMPE; hidden Markov model; maximum likelihood estimation; minimum phone error; pMPE; parameter estimation; posterior feature estimation; precision matrix; probabilistic modeling framework; semiparametric trajectory model; speech signal processing; state sequence; temporal varying attribute; temporal varying weight regression; temporally varying mean; temporally varying model parameter; time-varying Gaussian weight; trajectory modeling technique; vocabulary recognition; Acoustics; Hidden Markov models; Maximum likelihood estimation; Speech; Speech recognition; Trajectory; Vocabulary; Acoustic modeling; hidden Markov model; pattern recognition; trajectory modeling;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2013.2285487
Filename
6633086
Link To Document