Title :
From flat direct models to segmental CRF models
Author :
Zweig, Geoffrey ; Nguyen, Patrick
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Abstract :
This paper summarizes recent work at Microsoft on the development of novel direct models. The key characteristic of our approaches is the use of long-span segment level features that relate acoustic properties directly to words. In this approach, the frame-level Markov assumption is replaced by the segment level Markov property, allowing us to extract long-span features. A key issue we address is the definition of generalizable features which allow us to model unseen words. We review two recently developed models that have this property: Flat Direct Models (FDMs), and Segmental CRFs (SCRFs). The first operates in a log-linear framework, and uses utterance level features. The second is also a log-linear model, but defines features at the word-segment level. We present new experimental results comparing the two approaches. We find that both show consistent improvements over a baseline system, and that the extra context available to the FDM enables slightly better performance in a rescoring context. This gain comes at the expense of applicability to first pass decoding, for which the SCRF is better suited.
Keywords :
Markov processes; probability; speech recognition; Microsoft; first pass decoding; flat direct models; frame level Markov assumption; log linear model; segment level Markov property; segmental CRF models; Business communication; Cellular phones; Context modeling; Decoding; Entropy; Feature extraction; Hidden Markov models; Natural language processing; Speech recognition; Training data; Flat Direct Model; Segmental CRF; Speech Recognition; Voice Search;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495221