Author_Institution :
Dept. Comput. Sci. & Technol., Northern Jiaotong Univ., Beijing, China
Abstract :
Many partitional clustering algorithms originated from the definition of mean.We propose a new clustering model - general c-means clustering algorithm (GCM). Generally, when the data set is clustered into c (c > 1) subsets, each subset is often expected to have a different prototype (or cluster center) than others. Therefore, we propose the definition of undesirable solution of clustering algorithms. As the GCM has undesirable solution under a mild condition, undesirable solution of the GCM is not expected to be stable. According to these assumptions, we obtain the necessary conditions for the GCM as a good clustering model. Fortunately, such conditions have offered a theoretical basis for selection of the parameters in many clustering algorithms, which is an open problem for such algorithms, for example, we get the theoretical rule for selection of the weighting exponent in the FCM, and explain why the weighting exponent should be greater than 1, etc. Moreover, we discover the relation between the GCM model and Occam´s razor, which offers the deep reason behind many famous partitional clustering algorithms. Based on these results, we can study many objective function based clustering algorithms.
Keywords :
Occam; algorithm theory; data structures; image matching; image processing; least mean squares methods; maximum likelihood detection; pattern clustering; FCM; GCM; Occams razor; biology; data mining; data set clustering; general c-means clustering model; image processing; least mean square method; partitional clustering algorithm; pattern recognition; remote sensing; theoretical selection rule; weighting exponent; Annealing; Application software; Biological system modeling; Clustering algorithms; Computer science; Partitioning algorithms; Pattern recognition; Prototypes; Remote sensing; Temperature;