Title :
Word clustering with parallel spoken language corpora
Author :
Wang, Ye-Yi ; Lafferty, John ; Waibel, Alex
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks
Keywords :
language translation; natural languages; speech processing; statistical analysis; word processing; bilingual data; bilingual parallel corpus; machine translation tasks; monolingual data; mutual information clustering algorithms; parallel spoken language corpora; statistical translation model; word clustering algorithm; Books; Bridges; Clustering algorithms; Data mining; Entropy; Greedy algorithms; Merging; Mutual information; Natural languages; Scheduling;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607283