DocumentCode
573216
Title
Automatic Document Topic Identification using Wikipedia Hierarchical Ontology
Author
Hassan, Mostafa M. ; Karray, Fakhri ; Kamel, Mohamed S.
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fYear
2012
fDate
2-5 July 2012
Firstpage
237
Lastpage
242
Abstract
The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure in the form of a hierarchical ontology, using one of the largest online knowledge repositories: Wikipedia. Then, a novel approach is presented to automatically identify the documents´ topics based on the proposed Wikipedia Hierarchical Ontology (WHO). Results show that the proposed model is efficient in identifying documents´ topics, and promising, as it outperforms the accuracy of the other conventional algorithms for document clustering.
Keywords
Web sites; data mining; ontologies (artificial intelligence); text analysis; WHO; Wikipedia hierarchical ontology; automatic document topic identification; background knowledge structure; document clustering; largest online knowledge repositories; machine understanding; text mining; Accuracy; Electronic publishing; Encyclopedias; Entropy; Internet; Ontologies;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
Conference_Location
Montreal, QC
Print_ISBN
978-1-4673-0381-1
Electronic_ISBN
978-1-4673-0380-4
Type
conf
DOI
10.1109/ISSPA.2012.6310552
Filename
6310552
Link To Document