Title :
Accelerating Fuzzy-C Means Using an Estimated Subsample Size
Author :
Parker, Jonathon K. ; Hall, Lawrence O.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
Many algorithms designed to accelerate the fuzzy c-means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, i.e., geometric progressive fuzzy c-means (GOFCM) and minimum sample estimate random fuzzy c-means (MSERFCM), that use a statistical method to estimate the subsample size. GOFCM, which is a variant of single-pass fuzzy c-means (SPFCM), also leverages progressive sampling. MSERFCM, which is a variant of random sampling plus extension fuzzy c-means, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared with FCM and four accelerated variants of FCM. GOFCM´s speedup was four-47 times that of FCM and faster than SPFCM on each of the six datasets that are used in the experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM´s speedup was five-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM.
Keywords :
fuzzy set theory; geometry; pattern clustering; sampling methods; FCM clustering algorithm; GOFCM; MSERFCM; SPFCM; accelerated clustering; fuzzy c-means clustering algorithm; geometric progressive fuzzy c-means algorithm; minimum sample estimate random fuzzy c-means algorithm; progressive sampling; random sampling; single-pass fuzzy c-means; stopping criterion; subsample size estimation; Acceleration; Algorithm design and analysis; Clustering algorithms; Optimization; Partitioning algorithms; Runtime; Statistical analysis; Accelerated; extensible fast fuzzy c-means (EFFCM); fuzzy c-means (FCM); fuzzy clustering; geometric progressive fuzzy c-means (GOFCM); minimum sample estimate random fuzzy c-means (MSERFCM); multinomial proportion; online fuzzy c-means (OFCM); progressive sampling; random sampling plus extension fuzzy c-means (rseFCM); sampling; scalable; single-pass fuzzy c-means (SPFCM); stopping criterion;
Journal_Title :
Fuzzy Systems, IEEE Transactions on
DOI :
10.1109/TFUZZ.2013.2286993