DocumentCode :
3531045
Title :
Resampling auxiliary data for language model adaptation in machine translation for speech
Author :
Maskey, Sameer ; Sethy, Abhinav
Author_Institution :
IBM T.J. Watson Res. Center, New York, NY
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4817
Lastpage :
4820
Abstract :
Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of auxiliary textual material to supplement sparse in-domain resources. One of the problems in using such auxiliary corpora is that they may differ significantly from the specific nature of the domain of interest. In this paper, we propose three different methods for adapting language models for a speech to speech (S2S) translation system when auxiliary corpora are of different genre and domain. The proposed methods are based on centroid similarity, n-gram ratios and resampled language models. We show how these methods can be used to select out of domain textual data such as newswire text to improve a S2S system. We were able to achieve an overall relative improvement of 3.8% in BLEU score over a baseline system that uses only in-domain conversational data.
Keywords :
language translation; speech processing; auxiliary data resampling; language model adaptation; language modeling community; machine translation; n-gram language models; speech to speech translation system; Adaptation model; Entropy; Materials testing; Natural languages; Performance gain; Speech coding; Support vector machine classification; Support vector machines; System testing; Text categorization; Domain Adaptation; Language Model Adaptation; Machine Translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960709
Filename :
4960709
Link To Document :
بازگشت