DocumentCode
480702
Title
A Concept-Driven Automatic Ontology Generation Approach for Conceptualization of Document Corpora
Author
Zheng, Hai-Tao ; Borchert, Charles ; Kim, Hong-Gee
Author_Institution
Biomed. Knowledge Eng. Lab., Seoul Nat. Univ., Seoul
Volume
1
fYear
2008
fDate
9-12 Dec. 2008
Firstpage
352
Lastpage
358
Abstract
In the age of increasing information availability, many techniques, such as document clustering and information visualization, have been developed to ease understanding of information for users. However, most of these methods do not help users directly understand key concepts and their semantic relationships in document corpora, which are critical for capturing their conceptual structures. Therefore, we propose a novel approach called ´Clonto´ to identify the key concepts and automatically generate ontologies based on these concepts for conceptualization of document corpora. Clonto applies latent semantic analysis to identify key concepts, allocates documents based on these concepts, and utilizes WordNet to automatically generate a corpus-related ontology. The documents are linked to the ontology through the key concepts. The experimental results show that Clonto can identify key concepts with a high precision and the clustering results of Clonto outperform the STC (Suffix Tree Clustering) algorithm, the Lingo clustering algorithm, the Fuzzy Ants clustering algorithm, and clustering based on TRS (Tolerance Rough Set). Moreover, based on the same document corpus, the ontology generated by Clonto shows a significant informative conceptual structure.
Keywords
document handling; information retrieval; Clonto; WordNet; concept-driven automatic ontology generation; corpus-related ontology; document clustering; document corpora conceptualization; document corpus; information availability; information visualization; informative conceptual structure; key concept identification; latent semantic analysis; Clustering algorithms; Displays; Fuzzy sets; Intelligent agent; Knowledge engineering; Laboratories; Ontologies; Semantic Web; Text analysis; Visualization; Clonto; Lingo; Ontology; Suffix Tree Clustering; Tolerance Rought Set;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-0-7695-3496-1
Type
conf
DOI
10.1109/WIIAT.2008.233
Filename
4740471
Link To Document