• DocumentCode
    480702
  • Title

    A Concept-Driven Automatic Ontology Generation Approach for Conceptualization of Document Corpora

  • Author

    Zheng, Hai-Tao ; Borchert, Charles ; Kim, Hong-Gee

  • Author_Institution
    Biomed. Knowledge Eng. Lab., Seoul Nat. Univ., Seoul
  • Volume
    1
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    352
  • Lastpage
    358
  • Abstract
    In the age of increasing information availability, many techniques, such as document clustering and information visualization, have been developed to ease understanding of information for users. However, most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. Therefore, we propose a novel approach called ´Clonto´ to identify the key concepts and automatically generate ontologies based on these concepts for conceptualization of document corpora. Clonto applies latent semantic analysis to identify key concepts, allocates documents based on these concepts, and utilizes WordNet to automatically generate a corpus-related ontology. The documents are linked to the ontology through the key concepts. The experimental results show that Clonto can identify key concepts with a high precision and the clustering results of Clonto outperform the STC (Suffix Tree Clustering) algorithm, the Lingo clustering algorithm, the Fuzzy Ants clustering algorithm, and clustering based on TRS (Tolerance Rough Set). Moreover, based on the same document corpus, the ontology generated by Clonto shows a significant informative conceptual structure.
  • Keywords
    document handling; information retrieval; Clonto; WordNet; concept-driven automatic ontology generation; corpus-related ontology; document clustering; document corpora conceptualization; document corpus; information availability; information visualization; informative conceptual structure; key concept identification; latent semantic analysis; Clustering algorithms; Displays; Fuzzy sets; Intelligent agent; Knowledge engineering; Laboratories; Ontologies; Semantic Web; Text analysis; Visualization; Clonto; Lingo; Ontology; Suffix Tree Clustering; Tolerance Rought Set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.233
  • Filename
    4740471