Author :
Deshmukh, Neeraj ; Ganapathiraju, Aravind ; Picone, Joseph
Abstract :
Large vocabulary continuous speech recognition (LVCSR) systems have advanced significantly due to the ability to handle extremely large problem spaces in fairly small amounts of memory. The article introduces the search problem, discusses in detail a typical implementation of a search engine, and demonstrates the efficacy of this approach on a range of problems. The approach presented is scalable across a wide range of applications. It is designed to address research needs, where a premium is placed on the flexibility of the system architecture, and the needs of application prototypes, which require near-real-time speed without a great sacrifice in word error rate (WER). One major area of focus for researchers is the development of real-time systems. With only minor degradations in performance (typically, no more than a 25% increase in WER), the systems described in this article can be transformed into systems that operate at 10×RT or less. There are four active areas of research related to this problem. First, more intelligent pruning algorithms that prune the search space more heavily are required. Look-ahead and N-best strategies at all levels of the system are key to achieving such large reductions in the search space. Second, multi-pass systems that perform a quick search using a simple system, and then rescore only the N-best resulting hypotheses using better models are very popular for real-time implementation. Third, since much of the computation in these systems is devoted to acoustic model processing, fast-matching strategies within the acoustic model are important. Finally, since Gaussian evaluation at each state in the system is a major consumer of CPU time, vector quantization-like approaches that enable one to compute only a small number of Gaussians per frame are proven to be successful. In some sense, the Viterbi (1967) based system presented represents only one path through this continuum of recognition search strategies
Keywords :
Viterbi decoding; acoustic signal processing; real-time systems; search problems; speech coding; speech recognition; vector quantisation; CPU time; Gaussian evaluation; N-best strategies; Viterbi based system; acoustic model; acoustic model processing; application prototypes; decoding problem solution; ear-real-time speed; fast-matching strategies; hierarchical search; intelligent pruning algorithms; large-vocabulary conversational speech recognition; look-ahead strategies; memory; multi-pass systems; performance; real-time implementation; real-time systems; research; search engine; search problem; speech recognition search strategies; system architecture; vector quantization; word error rate; Art; Costs; Encoding; Hidden Markov models; Merging; Optimized production technology; Programmable control; Resource management; Soil; Speech recognition;