DocumentCode :
1409753
Title :
Using out-of-domain data to improve in-domain language models
Author :
Iyer, Rukmini ; Ostendorf, Mari ; Gish, Herb
Author_Institution :
Coll. of Eng., Boston Univ., MA, USA
Volume :
4
Issue :
8
fYear :
1997
Firstpage :
221
Lastpage :
223
Abstract :
Standard statistical language modeling techniques suffer from sparse data problems when applied to real tasks in speech recognition, where large amounts of domain-dependent text are not available. We investigate new approaches to improve sparse application-specific language models by combining domain dependent and out-of-domain data, including a back-off scheme that effectively leads to context-dependent multiple interpolation weights, and a likelihood-based similarity weighting scheme to discriminatively use data to train a task-specific language model. Experiments with both approaches on a spontaneous speech recognition task (switchboard), lead to reduced word error rate over a domain-specific n-gram language model, giving a larger gain than that obtained with previous brute-force data combination approaches.
Keywords :
grammars; interpolation; maximum likelihood estimation; natural languages; speech processing; speech recognition; statistical analysis; back-off scheme; brute force data combination; context dependent multiple interpolation weights; domain dependent data; domain dependent text; domain specific n-gram language model; experiments; in-domain language models; likelihood based similarity weighting; out of domain data; sparse application specific language models; sparse data problems; speech recognition; spontaneous speech recognition task; statistical language modeling; switchboard task; word error rate; Context modeling; Error analysis; Interpolation; Markov processes; Natural languages; Parameter estimation; Performance gain; Probability; Smoothing methods; Speech recognition;
fLanguage :
English
Journal_Title :
Signal Processing Letters, IEEE
Publisher :
ieee
ISSN :
1070-9908
Type :
jour
DOI :
10.1109/97.611282
Filename :
611282
Link To Document :
بازگشت