• DocumentCode
    573216
  • Title

    Automatic Document Topic Identification using Wikipedia Hierarchical Ontology

  • Author

    Hassan, Mostafa M. ; Karray, Fakhri ; Kamel, Mohamed S.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2012
  • fDate
    2-5 July 2012
  • Firstpage
    237
  • Lastpage
    242
  • Abstract
    The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure in the form of a hierarchical ontology, using one of the largest online knowledge repositories: Wikipedia. Then, a novel approach is presented to automatically identify the documents´ topics based on the proposed Wikipedia Hierarchical Ontology (WHO). Results show that the proposed model is efficient in identifying documents´ topics, and promising, as it outperforms the accuracy of the other conventional algorithms for document clustering.
  • Keywords
    Web sites; data mining; ontologies (artificial intelligence); text analysis; WHO; Wikipedia hierarchical ontology; automatic document topic identification; background knowledge structure; document clustering; largest online knowledge repositories; machine understanding; text mining; Accuracy; Electronic publishing; Encyclopedias; Entropy; Internet; Ontologies;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
  • Conference_Location
    Montreal, QC
  • Print_ISBN
    978-1-4673-0381-1
  • Electronic_ISBN
    978-1-4673-0380-4
  • Type

    conf

  • DOI
    10.1109/ISSPA.2012.6310552
  • Filename
    6310552