Title :
Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia
Author :
Huynh, Dat T. ; Cao, Tru H. ; Pham, Phuong H T ; Hoang, Toan N.
Author_Institution :
Fac. of Comput. Sci. & Eng., Ho Chi Minh City Univ. of Technol., Ho Chi Minh City, Vietnam
Abstract :
This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on computing and team sport domains have been carried out and showed that our proposed method outperforms the Schonhofen´s one.
Keywords :
information retrieval; text analysis; Wikipedia articles; Wikipedia category network; computing domains; document topic identification; hyperlink texts; team sport domains; Computer science; Crawlers; Humans; Knowledge engineering; Machine learning; Machine learning algorithms; Ontologies; Systems engineering and theory; Web sites; Wikipedia; Document topic identification; Wikipedia category network;
Conference_Titel :
Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4244-5086-2
Electronic_ISBN :
978-0-7695-3846-4
DOI :
10.1109/KSE.2009.20