Discrete mixture HMM

Author

Takahashi, Satoshi ; Aikawa, Kiyoaki ; Sagayama, Shigeki

Author_Institution

NTT Human Interface Labs., Kanagawa, Japan

Volume

2

fYear

1997

fDate

21-24 Apr 1997

Firstpage

971

Abstract

This paper proposes a new type of acoustic model called the discrete mixture HMM (DMHMM). As large scale speech databases have been constructed for speaker-independent HMMs, continuous mixture HMMs (CMHMMs) are needed to increase the number of mixture components in order to represent complex distributions. This leads to a high computational cost for calculating the output probabilities. The DMHMM represents the feature parameter space by using the mixtures of multivariate distributions in the same way as the diagonal covariance CMHMM. Instead of using Gaussian mixtures to represent the feature distributions in each dimension, the DMHMM uses the mixtures of the discrete distributions based on scalar quantization (SQ). Since the discrete distribution has a higher degree-of-freedom in terms of representation, the DMHMM is advantageous in representing the feature distributions efficiently with fewer mixture components. In isolated word recognition experiments for telephone speech, we have found that the DMHMM outperformed the CMHMMs when those models had the same number of mixture components

Keywords

acoustic signal processing; hidden Markov models; parameter estimation; probability; quantisation (signal); speech processing; speech recognition; statistical analysis; CMHMM; DMHMM; acoustic model; complex distributions; computational cost; continuous mixture HMM; diagonal covariance CMHMM; discrete distributions; discrete mixture HMM; feature distribution representation; feature parameter space; isolated word recognition experiments; large scale speech databases; mixture components; multivariate distributions; output probabilities; scalar quantization; speaker independent HMM; telephone speech; Acoustic distortion; Computational efficiency; Covariance matrix; Gaussian distribution; Hidden Markov models; Humans; Quantization; Shape; Spatial databases; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location

Munich

ISSN

1520-6149

Print_ISBN

0-8186-7919-0

Type

conf

DOI

10.1109/ICASSP.1997.596100

Filename

596100