DocumentCode :
3190006
Title :
GDClust: A Graph-Based Document Clustering Technique
Author :
Hossain, M. Shahriar ; Angryk, Rafal A.
Author_Institution :
Montana State Univ., Bozeman
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
417
Lastpage :
422
Abstract :
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (graph-based document clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
Keywords :
Gaussian processes; data mining; graph theory; natural language processing; ontologies (artificial intelligence); pattern clustering; text analysis; English language ontology; apriori paradigm; candidate subgraph generation; frequent senses; frequent subgraphs; graph-based document clustering technique; multilevel Gaussian minimum support approach; sense discovery; text mining techniques; Association rules; Books; Chemical analysis; Chemical technology; Clustering algorithms; Computer science; Conferences; Data mining; Humans; Ontologies;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
Type :
conf
DOI :
10.1109/ICDMW.2007.104
Filename :
4476701
Link To Document :
بازگشت