DocumentCode :
2973747
Title :
Self-supervised discriminative training of statistical language models
Author :
Xu, Puyang ; Karakos, Damianos ; Khudanpur, Sanjeev
Author_Institution :
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
317
Lastpage :
322
Abstract :
A novel self-supervised discriminative training method for estimating language models for automatic speech recognition (ASR) is proposed. Unlike traditional discriminative training methods that require transcribed speech, only untranscribed speech and a large text corpus is required. An exponential form is assumed for the language model, as done in maximum entropy estimation, but the model is trained from the text using a discriminative criterion that targets word confusions actually witnessed in first-pass ASR output lattices. Specifically, model parameters are estimated to maximize the likelihood ratio between words w in the text corpus and w´s cohorts in the test speech, i.e. other words that w competes with in the test lattices. Empirical results are presented to demonstrate statistically significant improvements over a 4-gram language model on a large vocabulary ASR task.
Keywords :
computational linguistics; maximum entropy methods; maximum likelihood estimation; speech recognition; automatic speech recognition; maximum entropy estimation; maximum likelihood ratio; self-supervised discriminative training; statistical language model; Automatic speech recognition; Entropy; Humans; Lattices; Maximum likelihood estimation; Natural languages; Parameter estimation; Speech processing; Speech recognition; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373401
Filename :
5373401
Link To Document :
بازگشت