DocumentCode
507486
Title
Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia
Author
Huynh, Dat T. ; Cao, Tru H. ; Pham, Phuong H T ; Hoang, Toan N.
Author_Institution
Fac. of Comput. Sci. & Eng., Ho Chi Minh City Univ. of Technol., Ho Chi Minh City, Vietnam
fYear
2009
fDate
13-17 Oct. 2009
Firstpage
249
Lastpage
254
Abstract
This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on computing and team sport domains have been carried out and showed that our proposed method outperforms the Schonhofen´s one.
Keywords
information retrieval; text analysis; Wikipedia articles; Wikipedia category network; computing domains; document topic identification; hyperlink texts; team sport domains; Computer science; Crawlers; Humans; Knowledge engineering; Machine learning; Machine learning algorithms; Ontologies; Systems engineering and theory; Web sites; Wikipedia; Document topic identification; Wikipedia category network;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4244-5086-2
Electronic_ISBN
978-0-7695-3846-4
Type
conf
DOI
10.1109/KSE.2009.20
Filename
5361697
Link To Document