Title :
Clustering Ontology-enriched Graph Representation for Biomedical Documents based on Scale-Free Network Theory
Author :
Yoo, Illhoi ; Hu, Xiaohua
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA
Abstract :
In this paper we introduce a novel document clustering approach that solves some major problems of traditional document clustering approaches. Instead of depending on traditional vector space model, this approach represents documents as graphs using domain knowledge in ontology because graphs can represent the semantic relationships among the concepts in documents. Based on scale-free network theory, our approach generates a model for each document cluster from the ontology-enriched graph representation by identifying k high density subgraphs capturing the core semantic relationship information about each document cluster. Using these k high density subgraphs, each document is assigned to a proper document cluster. Our extensive experimental results on MEDLINE articles show that our approach outperforms two leading document clustering algorithms, BiSecting K-means and CLUTO´s vcluster. Moreover, our approach provides a meaningful explanation for document clustering through generated models. This explanation helps users to understand clustering results and documents as a whole
Keywords :
complex networks; document handling; medical administrative data processing; network theory (graphs); ontologies (artificial intelligence); pattern clustering; MEDLINE articles; biomedical documents; document clustering; domain knowledge; graph clustering; high density subgraphs; ontology-enriched graph representation; scale-free network theory; semantic relationships; Clustering algorithms; Educational institutions; Engineering profession; Information retrieval; Nearest neighbor searches; Neoplasms; Network theory (graphs); Ontologies; Text mining; Vocabulary; document clustering; graph clustering; ontology; scale-free network;
Conference_Titel :
Intelligent Systems, 2006 3rd International IEEE Conference on
Conference_Location :
London
Print_ISBN :
1-4244-01996-8
Electronic_ISBN :
1-4244-01996-8
DOI :
10.1109/IS.2006.348532