DocumentCode
2993597
Title
An efficient vector-quantization preprocessor for speaker independent isolated word recognition
Author
Pan, K.C. ; Soong, F.K. ; Rabiner, L.R. ; Bergh, A.F.
Author_Institution
AT&T Bell Laboratories, Murray Hill, New Jersey
Volume
10
fYear
1985
fDate
31138
Firstpage
874
Lastpage
877
Abstract
Recently a new structure for isolated word recognition was proposed based on the ideas of vector quantization (VQ). In this scheme a separate VQ codebook, for each word in the vocabulary, was designed, based on a training sequence of several tokens of each word by one or more talkers. In the original implementation, the recognizer chose the word in the vocabulary whose average quantization distortion (according to its particular codebook) was minimum. In the proposed implementation, the word-based VQ´s are used as a front end preprocessor to eliminate word candidates whose distortion scores are large; a DTW processor then resolves the choice among the remaining word candidates (i.e. those which are passed on by the preprocessor). Both of the above schemes work very well for small vocabularies; however the major flaw is the lack of temporal information in the word-based VQ processor. As such, as the vocabulary for recognition grows in size and complexity, the ability of the VQ processor to resolve among similar sounding words decreases dramatically, and the effectiveness of the proposed recognition structure similarly decreases. To alleviate this difficulty a technique for incorporating temporal structure into the preprocessor is also proposed. In particular, the probability density function of the time of occurrence for each vector in the codebook is estimated from the same training sequence used to derive the codebook vectors. In the recognizer, the spectral distance score of the VQ is combined with a (scaled) temporal distance score, for each frame in the word. An evaluation of the proposed recognizer showed good performance on both the digits vocabulary, and on a vocabulary of 129 airlines terms.
Keywords
Algorithm design and analysis; Autocorrelation; Desktop publishing; Digital signal processing; Linear predictive coding; Logic; Training data; Vector quantization; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85.
Type
conf
DOI
10.1109/ICASSP.1985.1168317
Filename
1168317
Link To Document