Title :
Turkish — English cross language information retrieval using LSI
Author :
Erbug Celebi;Baturman Sen;Burak Gunel
Author_Institution :
Cyprus International University, Department of Computer Engineering, Lefko?a, TRNC, Mersin 10, Turkey
Abstract :
This paper describes a study of Turkish-English cross language information retrieval (CLIR) system. One of the biggest issues with CLIR studies is to access to bi-lingual parallel corpus. So, the first step of this study was to construct a parallel Turkish-English corpus. We have constructed a corpus that has 1801 parallel documents. The corpus has been divided in to two parts, first one for training the system and second one for testing the system. Latent semantic indexing (LSI) techniques applied to the training set to obtain the language relations. After the training, we have performed set of tests (queries) to measure the effectiveness of LSI based retrieval on Turkish-English parallel corpus. Our experimental results show that, LSI based CLIR outperforms the non-LSI based retrieval where their retrieval successes are %69 and %26 respectively.
Keywords :
"Natural languages","Information retrieval","Large scale integration","Dictionaries","Indexing","System testing","Performance evaluation","Terminology","Filters","Statistics"
Conference_Titel :
Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on
Print_ISBN :
978-1-4244-5021-3
DOI :
10.1109/ISCIS.2009.5291896