• DocumentCode
    1102092
  • Title

    A hierarchical decision approach to large-vocabulary discrete utterance recognition

  • Author

    Kaneko, Toyohisa ; Dixon, N. Rex

  • Author_Institution
    IBM Japan Science Institute, Tokyo, Japan
  • Volume
    31
  • Issue
    5
  • fYear
    1983
  • fDate
    10/1/1983 12:00:00 AM
  • Firstpage
    1061
  • Lastpage
    1066
  • Abstract
    Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most of today\´s commercially available recognizers is limited to several hundreds of utterances, primarily due to the fact that detailed acoustic matching involves considerable computation. The method presented here offers an economical solution to the real-time large-vocabulary recognition problem by carrying out recognition in two stages. In the initial stage, the incoming utterance is linearly matched against the entire vocabulary using only two features-utterance duration and either two or three average spectra for each utterance. While the number of prototypes matched is large, the time required per match is substantially reduced. During this initial stage, a preset number of best-match prototypes is determined for each unknown input. In the second stage, matching is performed for the best-match list based upon more detailed features (e.g., 10-ms log-power spectra), using more elaborate matching methodology, e.g., dynamic programming. Evaluation experiments were conducted using the 2000 most frequent words in an office-correspondence corpus and three normal adult-male talkers. It was observed that first-stage best-match lists of 30-50 items included the "correct" words between 99.0 and 99.5 percent of the time. Using DP on 10-ms spectral samples for the second stage, recognition accuracy ranged from 86.5 to 94.5 percent. A match-limiter, when used with a 50-64-word, commercially available recognizer for the second stage, makes near-real-time large-vocabulary recognition feasible.
  • Keywords
    Delay; Dynamic programming; Hardware; Linear predictive coding; Parallel processing; Prototypes; Real time systems; Robustness; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Acoustics, Speech and Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0096-3518
  • Type

    jour

  • DOI
    10.1109/TASSP.1983.1164211
  • Filename
    1164211