DocumentCode :
1288306
Title :
Phoneme-based vector quantization in a discrete HMM speech recognizer
Author :
Zhang, Yaxin ; Togneri, Roberto ; Alder, Michael
Author_Institution :
Motorola Australian Res. Centre, Bontany, NSW, Australia
Volume :
5
Issue :
1
fYear :
1997
fDate :
1/1/1997 12:00:00 AM
Firstpage :
26
Lastpage :
32
Abstract :
The quantization distortion of vector quantization (VQ) is a key element that affects the performance of a discrete hidden Markov modeling (DHMM) system. Many researchers have realized this problem and tried to use integrated feature or multiple codebook in their systems to offset the disadvantage of the conventional VQ. However the computational complexity of those systems is then increased. Investigations have shown that the speech signal space consists of finite clusters that represent phoneme data sets from male and female speakers and reveal Gaussian distributions. We propose an alternative VQ method in which the phoneme is treated as a cluster in the speech space and a Gaussian model is estimated for each phoneme. A Gaussian mixture model (GMM) is generated by the expectation-maximization (EM) algorithm for the whole speech space and used as a codebook in which each code word is a Gaussian model and represents a certain cluster. An input utterance would be classified as a certain phoneme or a set of phonemes only when the phoneme or phonemes gave highest likelihood. A typical discrete HMM system was used for both phoneme and isolated word recognition. The results show that the phoneme-based Gaussian modeling vector quantization classifies the speech space more effectively and significant improvements in the performance of the DHMM system have been achieved
Keywords :
Gaussian distribution; hidden Markov models; speech coding; speech processing; speech recognition; vector quantisation; Gaussian distributions; Gaussian mixture model; Gaussian model; VQ; code word; computational complexity; discrete HMM speech recognizer; discrete HMM system; discrete hidden Markov modeling; expectation-maximization algorithm; female speakers; input utterance; integrated feature; isolated word recognition; male speakers; multiple codebook; performance; phoneme based vector quantization; phoneme data sets; phoneme recognition; quantization distortion; speech signal space; Associate members; Australia; Clustering algorithms; Computational complexity; Computational efficiency; Gaussian distribution; Hidden Markov models; Partitioning algorithms; Speech recognition; Vector quantization;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.554266
Filename :
554266
Link To Document :
بازگشت