Title : 
GDClust: A Graph-Based Document Clustering Technique
         
        
            Author : 
Hossain, M. Shahriar ; Angryk, Rafal A.
         
        
            Author_Institution : 
Montana State Univ., Bozeman
         
        
        
        
        
        
            Abstract : 
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (graph-based document clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
         
        
            Keywords : 
Gaussian processes; data mining; graph theory; natural language processing; ontologies (artificial intelligence); pattern clustering; text analysis; English language ontology; apriori paradigm; candidate subgraph generation; frequent senses; frequent subgraphs; graph-based document clustering technique; multilevel Gaussian minimum support approach; sense discovery; text mining techniques; Association rules; Books; Chemical analysis; Chemical technology; Clustering algorithms; Computer science; Conferences; Data mining; Humans; Ontologies;
         
        
        
        
            Conference_Titel : 
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
         
        
            Conference_Location : 
Omaha, NE
         
        
            Print_ISBN : 
978-0-7695-3019-2
         
        
            Electronic_ISBN : 
978-0-7695-3033-8
         
        
        
            DOI : 
10.1109/ICDMW.2007.104