Title :
Automatic Document Topic Identification using Wikipedia Hierarchical Ontology
Author :
Hassan, Mostafa M. ; Karray, Fakhri ; Kamel, Mohamed S.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure in the form of a hierarchical ontology, using one of the largest online knowledge repositories: Wikipedia. Then, a novel approach is presented to automatically identify the documents´ topics based on the proposed Wikipedia Hierarchical Ontology (WHO). Results show that the proposed model is efficient in identifying documents´ topics, and promising, as it outperforms the accuracy of the other conventional algorithms for document clustering.
Keywords :
Web sites; data mining; ontologies (artificial intelligence); text analysis; WHO; Wikipedia hierarchical ontology; automatic document topic identification; background knowledge structure; document clustering; largest online knowledge repositories; machine understanding; text mining; Accuracy; Electronic publishing; Encyclopedias; Entropy; Internet; Ontologies;
Conference_Titel :
Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4673-0381-1
Electronic_ISBN :
978-1-4673-0380-4
DOI :
10.1109/ISSPA.2012.6310552