Title :
A Concept-Driven Automatic Ontology Generation Approach for Conceptualization of Document Corpora
Author :
Zheng, Hai-Tao ; Borchert, Charles ; Kim, Hong-Gee
Author_Institution :
Biomed. Knowledge Eng. Lab., Seoul Nat. Univ., Seoul
Abstract :
In the age of increasing information availability, many techniques, such as document clustering and information visualization, have been developed to ease understanding of information for users. However, most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. Therefore, we propose a novel approach called ´Clonto´ to identify the key concepts and automatically generate ontologies based on these concepts for conceptualization of document corpora. Clonto applies latent semantic analysis to identify key concepts, allocates documents based on these concepts, and utilizes WordNet to automatically generate a corpus-related ontology. The documents are linked to the ontology through the key concepts. The experimental results show that Clonto can identify key concepts with a high precision and the clustering results of Clonto outperform the STC (Suffix Tree Clustering) algorithm, the Lingo clustering algorithm, the Fuzzy Ants clustering algorithm, and clustering based on TRS (Tolerance Rough Set). Moreover, based on the same document corpus, the ontology generated by Clonto shows a significant informative conceptual structure.
Keywords :
document handling; information retrieval; Clonto; WordNet; concept-driven automatic ontology generation; corpus-related ontology; document clustering; document corpora conceptualization; document corpus; information availability; information visualization; informative conceptual structure; key concept identification; latent semantic analysis; Clustering algorithms; Displays; Fuzzy sets; Intelligent agent; Knowledge engineering; Laboratories; Ontologies; Semantic Web; Text analysis; Visualization; Clonto; Lingo; Ontology; Suffix Tree Clustering; Tolerance Rought Set;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
DOI :
10.1109/WIIAT.2008.233