Fuzzy co-clustering of documents and keywords

Author

Kummamuru, Krishna ; Dhawale, Ajay ; Krishnapuram, Raghu

Author_Institution

IBM India Res. Lab., IIT, New Delhi, India

Volume

2

fYear

2003

Firstpage

772

Abstract

Conventional clustering algorithms such as K-means and SAHN (also known as AHC) have been well studied and used in the information retrieval community for clustering text documents. More recently, efforts have been made to cluster documents and words simultaneously. The FCCM algorithm due to Oh et al. is a fuzzy clustering algorithm that maximizes the co-occurrence of categorical attributes (keywords) and the individual patterns (documents) in clusters. However, this algorithm poses certain problems when the number of documents or the number of words is very large. In this paper, we modify the FCCM algorithm so that it can be used to cluster large text corpora. Our experiments show that the modified algorithm is scalable and produces meaningful clusters. We also show the relation between FCCM and the Spherical K-Means (SKM) algorithm and introduce the Spherical Fuzzy c-Means (SFCM) algorithm.

Keywords

fuzzy set theory; information retrieval; information retrieval systems; pattern clustering; text analysis; categorical attributes; categorical multivariate data; clustering text documents; conventional clustering algorithms; document co-clustering; fuzzy clustering algorithms; individual patterns; information retrieval community; keyword co-clustering; sequential agglomerative hierarchial nonoverlapping; spherical fuzzy c-means algorithm; spherical k-means algorithm; Bipartite graph; Clustering algorithms; Frequency shift keying; Information retrieval; Partitioning algorithms; Text mining;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on

Print_ISBN

0-7803-7810-5

Type

conf

DOI

10.1109/FUZZ.2003.1206527

Filename

1206527