Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - MAP decoding and evaluation

Author

Seide, Frank ; Zhou, Jian-lai ; Deng, Li

Author_Institution

5F Beijing Sigma Center, Microsoft Res. Asia, Beijing, China

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

The hidden dynamic model (HDM) has been an attractive acoustic modeling approach because it provides a computational model for coarticulation and the dynamics of human speech. However, the lack of a direct decoding algorithm has been a barrier to research progress on HDM. We have developed a new HDM-based acoustic model, the hidden-trajectory HMM (HTHMM), which combines the state/mixture topology of a traditional monophone HMM with a target-directed hidden-trajectory model (a special form of HDM) for coarticulation modeling. Because the classical Viterbi algorithm is not admissible, we have developed a novel MAP decoding algorithm for HTHMM that correctly takes the hidden continuous trajectory into account. This paper introduces our new HTHMM decoder that allows us for the first time to evaluate an HDM-type model by direct decoding instead of N-best rescoring. Using direct decoding, we demonstrate that the coarticulatory mechanism of our HTHMM matches traditional context-dependent modeling (enumeration of model parameters): The context-independent HTHMM has slightly better accuracy than a crossword-triphone HMM on the Aurora2 task. The decoder also enables us to include state-boundary optimization into the HDM/HTHMM training procedure. This paper presents the detailed decoding algorithm and evaluation results, while in Zhou et al. (2003) we present the HTHMM model itself and parameter training.

Keywords

hidden Markov models; maximum likelihood decoding; speech processing; speech recognition; Aurora2 task; HDM; HMM; HTHMM; MAP decoding; acoustic modeling; coarticulation modeling; direct decoding; embedding; hidden continuous trajectory; hidden dynamic model; hidden-trajectory HMM; monophone HMM; state-boundary optimization; state/mixture topology; target-directed hidden trajectory model; training procedure; Asia; Computational modeling; Context modeling; Decoding; Hidden Markov models; Humans; Speech analysis; Topology; Trajectory; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198889

Filename

1198889