• DocumentCode
    2993597
  • Title

    An efficient vector-quantization preprocessor for speaker independent isolated word recognition

  • Author

    Pan, K.C. ; Soong, F.K. ; Rabiner, L.R. ; Bergh, A.F.

  • Author_Institution
    AT&T Bell Laboratories, Murray Hill, New Jersey
  • Volume
    10
  • fYear
    1985
  • fDate
    31138
  • Firstpage
    874
  • Lastpage
    877
  • Abstract
    Recently a new structure for isolated word recognition was proposed based on the ideas of vector quantization (VQ). In this scheme a separate VQ codebook, for each word in the vocabulary, was designed, based on a training sequence of several tokens of each word by one or more talkers. In the original implementation, the recognizer chose the word in the vocabulary whose average quantization distortion (according to its particular codebook) was minimum. In the proposed implementation, the word-based VQ´s are used as a front end preprocessor to eliminate word candidates whose distortion scores are large; a DTW processor then resolves the choice among the remaining word candidates (i.e. those which are passed on by the preprocessor). Both of the above schemes work very well for small vocabularies; however the major flaw is the lack of temporal information in the word-based VQ processor. As such, as the vocabulary for recognition grows in size and complexity, the ability of the VQ processor to resolve among similar sounding words decreases dramatically, and the effectiveness of the proposed recognition structure similarly decreases. To alleviate this difficulty a technique for incorporating temporal structure into the preprocessor is also proposed. In particular, the probability density function of the time of occurrence for each vector in the codebook is estimated from the same training sequence used to derive the codebook vectors. In the recognizer, the spectral distance score of the VQ is combined with a (scaled) temporal distance score, for each frame in the word. An evaluation of the proposed recognizer showed good performance on both the digits vocabulary, and on a vocabulary of 129 airlines terms.
  • Keywords
    Algorithm design and analysis; Autocorrelation; Desktop publishing; Digital signal processing; Linear predictive coding; Logic; Training data; Vector quantization; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85.
  • Type

    conf

  • DOI
    10.1109/ICASSP.1985.1168317
  • Filename
    1168317