Title :
A left-to-right HDP-HMM with HDPM emissions
Author :
Torbati, Amir Hossein Harati Nejad ; Picone, J. ; Sobel, Marc
Author_Institution :
Dept. of Electr. & Comput. Eng., Temple Univ., Philadelphia, PA, USA
Abstract :
Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.
Keywords :
Bayes methods; Gaussian processes; decoding; hidden Markov models; mixture models; speech recognition; Bayesian framework; DPM; Dirichlet Process Model; Gaussian mixture models; HDP mixture emissions; HDP-HMM; TIMIT Corpus; data points; ergodic topology; final states; finite length sequences; hierarchical Dirichlet Process hidden Markov model; left-to-right structure; mixture distribution-like model; nonemitting initial states; nonparametric Bayesian models; phoneme classification task; sequential nature; sequential speech decoding; speaker variability; speech recognition; speech signal; Bayes methods; Computational modeling; Data models; Hidden Markov models; Speech; Speech recognition; Topology; hidden Markov models; hierarchical Dirichlet processes; non-parametric Bayesian models; speech recognition;
Conference_Titel :
Information Sciences and Systems (CISS), 2014 48th Annual Conference on
Conference_Location :
Princeton, NJ
DOI :
10.1109/CISS.2014.6814172