A hierarchical decision approach to large-vocabulary discrete utterance recognition

Author

Kaneko, Toyohisa ; Dixon, N. Rex

Author_Institution

IBM Japan Science Institute, Tokyo, Japan

Volume

31

Issue

5

fYear

1983

fDate

10/1/1983 12:00:00 AM

Firstpage

1061

Lastpage

1066

Abstract

Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most of today\´s commercially available recognizers is limited to several hundreds of utterances, primarily due to the fact that detailed acoustic matching involves considerable computation. The method presented here offers an economical solution to the real-time large-vocabulary recognition problem by carrying out recognition in two stages. In the initial stage, the incoming utterance is linearly matched against the entire vocabulary using only two features-utterance duration and either two or three average spectra for each utterance. While the number of prototypes matched is large, the time required per match is substantially reduced. During this initial stage, a preset number of best-match prototypes is determined for each unknown input. In the second stage, matching is performed for the best-match list based upon more detailed features (e.g., 10-ms log-power spectra), using more elaborate matching methodology, e.g., dynamic programming. Evaluation experiments were conducted using the 2000 most frequent words in an office-correspondence corpus and three normal adult-male talkers. It was observed that first-stage best-match lists of 30-50 items included the "correct" words between 99.0 and 99.5 percent of the time. Using DP on 10-ms spectral samples for the second stage, recognition accuracy ranged from 86.5 to 94.5 percent. A match-limiter, when used with a 50-64-word, commercially available recognizer for the second stage, makes near-real-time large-vocabulary recognition feasible.

Keywords

Delay; Dynamic programming; Hardware; Linear predictive coding; Parallel processing; Prototypes; Real time systems; Robustness; Speech recognition; Vocabulary;

fLanguage

English

Journal_Title

Acoustics, Speech and Signal Processing, IEEE Transactions on

Publisher

ieee

ISSN

0096-3518

Type

jour

DOI

10.1109/TASSP.1983.1164211

Filename

1164211