Title :
A New Cluster Validity Index Based on Fuzzy Granulation-degranulation Criteria
Author :
Saha, Sriparna ; Bandyopadhyay, Sanghamitra
Author_Institution :
Indian Stat. Inst., Kolkata
Abstract :
Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.
Keywords :
data mining; fuzzy set theory; genetic algorithms; pattern classification; pattern clustering; vector quantisation; K-means clustering algorithm; XB-index; cluster validity index; error function; fuzzy granulation-degranulation criterion; fuzzy vector quantization; genetic algorithm; Books; Clustering algorithms; Clustering methods; Decoding; Genetic algorithms; Machine intelligence; Partitioning algorithms; Scattering; Vector quantization; Virtual manufacturing;
Conference_Titel :
Advanced Computing and Communications, 2007. ADCOM 2007. International Conference on
Conference_Location :
Guwahati, Assam
Print_ISBN :
0-7695-3059-1
DOI :
10.1109/ADCOM.2007.19