Title :
Cross-lingual latent semantic analysis for language modeling
Author :
Kim, Woosung ; Khudanpur, Sanjeev
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
Statistical language model estimation requires large amounts of domain-specific text, which is difficult to obtain in many languages. We propose techniques which exploit domain-specific text in a resource-rich language to adapt a language model in a resource-deficient language. A primary advantage of our technique is that in the process of cross-lingual language model adaptation, we do not rely on the availability of any machine translation capability. Instead, we assume that only a modest-sized collection of story-aligned document-pairs in the two languages is available. We use ideas from cross-lingual latent semantic analysis to develop a single low-dimensional representation shared by words and documents in both languages, which enables us to (i) find documents in the resource-rich language pertaining to a specific story in the resource-deficient language, and (ii) extract statistics from the pertinent documents to adapt a language model to the story of interest. We demonstrate significant reductions in perplexity and error rates in a Mandarin speech recognition task using this technique.
Keywords :
computational linguistics; error statistics; natural languages; semantic networks; speech processing; speech recognition; statistical analysis; cross-lingual language model adaptation; cross-lingual latent semantic analysis; document statistics extraction; domain-specific text; error rate reduction; language modeling; low-dimensional representation; natural language processing; perplexity reduction; resource-deficient language; resource-rich language; speech processing; speech recognition; statistical language model estimation; story-aligned document-pairs; Adaptation model; Automatic speech recognition; Availability; Error analysis; Information retrieval; Natural language processing; Natural languages; Speech processing; Speech recognition; Statistical analysis;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1325971