DocumentCode :
507486
Title :
Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia
Author :
Huynh, Dat T. ; Cao, Tru H. ; Pham, Phuong H T ; Hoang, Toan N.
Author_Institution :
Fac. of Comput. Sci. & Eng., Ho Chi Minh City Univ. of Technol., Ho Chi Minh City, Vietnam
fYear :
2009
fDate :
13-17 Oct. 2009
Firstpage :
249
Lastpage :
254
Abstract :
This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on computing and team sport domains have been carried out and showed that our proposed method outperforms the Schonhofen´s one.
Keywords :
information retrieval; text analysis; Wikipedia articles; Wikipedia category network; computing domains; document topic identification; hyperlink texts; team sport domains; Computer science; Crawlers; Humans; Knowledge engineering; Machine learning; Machine learning algorithms; Ontologies; Systems engineering and theory; Web sites; Wikipedia; Document topic identification; Wikipedia category network;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4244-5086-2
Electronic_ISBN :
978-0-7695-3846-4
Type :
conf
DOI :
10.1109/KSE.2009.20
Filename :
5361697
Link To Document :
بازگشت