Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper overviews a series of recent approaches to front-end processing, acoustic modeling, language modeling, and back-end search and system combination which have made contributions for large vocabulary continuous speech recognition (LVCSR) systems. These approaches include the feature transformations, speaker-adaptive features, and discriminative features in front-end processing, the feature-space and model-space discriminative training, deep neural networks, and speaker adaptation in acoustic modeling, the backoff smoothing, large-span modeling, and model regularization in language modeling, and the system combination, cross-adaptation, and boosting in search and system combination. Some future directions for LVCSR research are also addressed.
Keywords :
feature extraction; neural nets; speech recognition; LVCSR; acoustic modeling; back-end search; discriminative features; feature transformations; front-end processing; language modeling; large vocabulary continuous speech recognition; model regularization; neural networks; speaker adaptation; speaker-adaptive features; Acoustics; Adaptation models; Data models; Hidden Markov models; Speech; Training; Vectors;