DocumentCode :
2267249
Title :
Word clustering with parallel spoken language corpora
Author :
Wang, Ye-Yi ; Lafferty, John ; Waibel, Alex
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume :
4
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
2364
Abstract :
We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks
Keywords :
language translation; natural languages; speech processing; statistical analysis; word processing; bilingual data; bilingual parallel corpus; machine translation tasks; monolingual data; mutual information clustering algorithms; parallel spoken language corpora; statistical translation model; word clustering algorithm; Books; Bridges; Clustering algorithms; Data mining; Entropy; Greedy algorithms; Merging; Mutual information; Natural languages; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607283
Filename :
607283
Link To Document :
بازگشت