DocumentCode :
3648528
Title :
Cross-lingual document similarity
Author :
Andrej Muhič;Jan Rupnik;Primož Škraba
Author_Institution :
A.I. Laboratory, Jozef Stefan Institute, Jamova 39, 10000 Ljubljana, Slovenia
fYear :
2012
fDate :
6/1/2012 12:00:00 AM
Firstpage :
387
Lastpage :
392
Abstract :
In this paper we investigated how to compute similarities between documents written in different languages based on a weekly aligned multi-lingual collection of documents. Computing the cross-lingual similarities is based on an aligned set of basis vectors obtained by either latent semantic indexing or the k-means algorithm on an aligned multi-lingual corpus. We evaluated the methods on two data sets: Wikipedia and European Parliament Proceedings Parallel Corpus.
Keywords :
"Europe","Information services","Electronic publishing","Internet"
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces (ITI), Proceedings of the ITI 2012 34th International Conference on
ISSN :
1334-2762
Print_ISBN :
978-1-4673-1629-3
Type :
conf
DOI :
10.2498/iti.2012.0467
Filename :
6308038
Link To Document :
بازگشت