DocumentCode :
2891559
Title :
Evaluating a Cross-Language Semantically Enriched Search Engine
Author :
Zhuhadar, Leyla ; Nasraoui, Olfa
Author_Institution :
Dept. of Comput. Eng. & Comput. Sci., Univ. of Louisville, Louisville, KY, USA
fYear :
2010
fDate :
12-14 April 2010
Firstpage :
1074
Lastpage :
1079
Abstract :
This paper tackles the problem of a user who is capable of reading or using documents written in a specific language, but who is not fluent enough in this specific language to use the right query terms to find the document. The design of Cross-Language Information Retrieval systems started since 1969 by Gerard Salton who enhanced his SMART system to retrieve documents in multiple languages, English and Spanish; however, the translation process is still considered to be a challenging problem. This paper is devoted to the evaluation of a Cross-Language search engine that uses Natural Language Processing techniques as a means of improving the search process of documents provided by two languages, English and Spanish. The research is implemented and evaluated on a real platform HyperManyMedia at Western Kentucky University. The implementation of the Cross-Language search engine follows a synergistic approach between (1) A Thesaurus-based Approach and (2) A Corpus-based Approach. In the case of the Thesaurus-based Approach, we use a simple bilingual listing of terms, phrases, concepts, and subconcepts where the hierarchical structure of the ontology is used to define the relationship between concepts/subconcepts. Also, we use a specific terminology that captures the domain of E-learning; those terms are associated with college name, course name, and lecture name which is presented in two languages. In the case of the Corpus-based Approach, we use the Term Vector Translation approach; the goal is to find statistical information about term usage between the two languages using techniques which map sets of term weights from English to Spanish and vice-versa.
Keywords :
computer aided instruction; language translation; natural language processing; ontologies (artificial intelligence); query processing; search engines; thesauri; E-learning; HyperManyMedia; SMART system; Western Kentucky University; corpus-based approach; cross-language information retrieval systems; cross-language semantically enriched search engine; hierarchical ontology structure; natural language processing techniques; query terms; specific language; term vector translation; thesaurus-based approach; translation process; Information retrieval; Information technology; Knowledge engineering; Knowledge representation; Natural language processing; Natural languages; Ontologies; Search engines; Speech processing; Web mining; cross-language; evaluation; information retrieval; ontology; search engine; semantic web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: New Generations (ITNG), 2010 Seventh International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-6270-4
Type :
conf
DOI :
10.1109/ITNG.2010.237
Filename :
5501488
Link To Document :
بازگشت