DocumentCode
310558
Title
Discrete mixture HMM
Author
Takahashi, Satoshi ; Aikawa, Kiyoaki ; Sagayama, Shigeki
Author_Institution
NTT Human Interface Labs., Kanagawa, Japan
Volume
2
fYear
1997
fDate
21-24 Apr 1997
Firstpage
971
Abstract
This paper proposes a new type of acoustic model called the discrete mixture HMM (DMHMM). As large scale speech databases have been constructed for speaker-independent HMMs, continuous mixture HMMs (CMHMMs) are needed to increase the number of mixture components in order to represent complex distributions. This leads to a high computational cost for calculating the output probabilities. The DMHMM represents the feature parameter space by using the mixtures of multivariate distributions in the same way as the diagonal covariance CMHMM. Instead of using Gaussian mixtures to represent the feature distributions in each dimension, the DMHMM uses the mixtures of the discrete distributions based on scalar quantization (SQ). Since the discrete distribution has a higher degree-of-freedom in terms of representation, the DMHMM is advantageous in representing the feature distributions efficiently with fewer mixture components. In isolated word recognition experiments for telephone speech, we have found that the DMHMM outperformed the CMHMMs when those models had the same number of mixture components
Keywords
acoustic signal processing; hidden Markov models; parameter estimation; probability; quantisation (signal); speech processing; speech recognition; statistical analysis; CMHMM; DMHMM; acoustic model; complex distributions; computational cost; continuous mixture HMM; diagonal covariance CMHMM; discrete distributions; discrete mixture HMM; feature distribution representation; feature parameter space; isolated word recognition experiments; large scale speech databases; mixture components; multivariate distributions; output probabilities; scalar quantization; speaker independent HMM; telephone speech; Acoustic distortion; Computational efficiency; Covariance matrix; Gaussian distribution; Hidden Markov models; Humans; Quantization; Shape; Spatial databases; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location
Munich
ISSN
1520-6149
Print_ISBN
0-8186-7919-0
Type
conf
DOI
10.1109/ICASSP.1997.596100
Filename
596100
Link To Document