Title :
Progressive sampling schemes for approximate clustering in very large data sets
Author :
Bezdek, James C. ; Hathaway, Richard J.
Author_Institution :
Dept. of Comput. Sci., West Florida Univ., Pensacola, FL, USA
Abstract :
The extensible fast fuzzy c-means algorithm (eFFCM) finds clusters in very large digital images. eFFCM identifies a representative subsample of the image, which is then clustered using the fuzzy c-means (FCM) algorithm. The subsample solution is then extended to secure an approximate clustering of the remaining pixels in the image. This article discusses generalized eFFCM (geFFCM), the extension of eFFCM to general non-image data. Our extension accelerates literal fuzzy c-means (LFCM) on all (loadable) data sets. Second, geFFCM provides feasibility - a way to find (approximate) clusters - for data sets that are too large to be loaded in a single computer. Our experiments suggest that the chi-squared or divergence test for goodness of fit alone identifies good subsamples. This new subsampling method should be equally effective for acceleration and feasibility with VL data by any extensible clustering algorithm (not just FCM).
Keywords :
computer vision; fuzzy set theory; pattern clustering; very large databases; approximate clustering; extensible fast fuzzy c-means algorithm; progressive sampling schemes; very large data sets; very large digital images; Acceleration; Clustering algorithms; Computer science; Digital images; Fuzzy logic; Fuzzy sets; Pixel; Sampling methods; Scalability; Testing;
Conference_Titel :
Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8353-2
DOI :
10.1109/FUZZY.2004.1375677