مرکز منطقه ای اطلاع رساني علوم و فناوري - Progressive sampling schemes for approximate clustering in very large data sets

DocumentCode :

2251344

Title :

Progressive sampling schemes for approximate clustering in very large data sets

Author :

Bezdek, James C. ; Hathaway, Richard J.

Author_Institution :

Dept. of Comput. Sci., West Florida Univ., Pensacola, FL, USA

Volume :

fYear :

2004

fDate :

25-29 July 2004

Firstpage :

Abstract :

The extensible fast fuzzy c-means algorithm (eFFCM) finds clusters in very large digital images. eFFCM identifies a representative subsample of the image, which is then clustered using the fuzzy c-means (FCM) algorithm. The subsample solution is then extended to secure an approximate clustering of the remaining pixels in the image. This article discusses generalized eFFCM (geFFCM), the extension of eFFCM to general non-image data. Our extension accelerates literal fuzzy c-means (LFCM) on all (loadable) data sets. Second, geFFCM provides feasibility - a way to find (approximate) clusters - for data sets that are too large to be loaded in a single computer. Our experiments suggest that the chi-squared or divergence test for goodness of fit alone identifies good subsamples. This new subsampling method should be equally effective for acceleration and feasibility with VL data by any extensible clustering algorithm (not just FCM).

Keywords :

computer vision; fuzzy set theory; pattern clustering; very large databases; approximate clustering; extensible fast fuzzy c-means algorithm; progressive sampling schemes; very large data sets; very large digital images; Acceleration; Clustering algorithms; Computer science; Digital images; Fuzzy logic; Fuzzy sets; Pixel; Sampling methods; Scalability; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on

ISSN :

1098-7584

Print_ISBN :

0-7803-8353-2

Type :

conf

DOI :

10.1109/FUZZY.2004.1375677

Filename :

1375677

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2251344