DocumentCode
1563751
Title
Fuzzy co-clustering of documents and keywords
Author
Kummamuru, Krishna ; Dhawale, Ajay ; Krishnapuram, Raghu
Author_Institution
IBM India Res. Lab., IIT, New Delhi, India
Volume
2
fYear
2003
Firstpage
772
Abstract
Conventional clustering algorithms such as K-means and SAHN (also known as AHC) have been well studied and used in the information retrieval community for clustering text documents. More recently, efforts have been made to cluster documents and words simultaneously. The FCCM algorithm due to Oh et al. is a fuzzy clustering algorithm that maximizes the co-occurrence of categorical attributes (keywords) and the individual patterns (documents) in clusters. However, this algorithm poses certain problems when the number of documents or the number of words is very large. In this paper, we modify the FCCM algorithm so that it can be used to cluster large text corpora. Our experiments show that the modified algorithm is scalable and produces meaningful clusters. We also show the relation between FCCM and the Spherical K-Means (SKM) algorithm and introduce the Spherical Fuzzy c-Means (SFCM) algorithm.
Keywords
fuzzy set theory; information retrieval; information retrieval systems; pattern clustering; text analysis; categorical attributes; categorical multivariate data; clustering text documents; conventional clustering algorithms; document co-clustering; fuzzy clustering algorithms; individual patterns; information retrieval community; keyword co-clustering; sequential agglomerative hierarchial nonoverlapping; spherical fuzzy c-means algorithm; spherical k-means algorithm; Bipartite graph; Clustering algorithms; Frequency shift keying; Information retrieval; Partitioning algorithms; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on
Print_ISBN
0-7803-7810-5
Type
conf
DOI
10.1109/FUZZ.2003.1206527
Filename
1206527
Link To Document