Title :
New incremental fuzzy c medoids clustering algorithms
Author :
Labroche, Nicolas
Author_Institution :
Lab. d´´Inf. de Paris 6, Univ. Pierre et Marie Curie - Paris 6, Paris, France
Abstract :
This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily available at once or can not fit in main memory. Some fuzzy algorithms already propose solutions to manage large datasets in a similar way but are generally limited to spatial datasets to avoid the complexity of medoids computation. Our methods keep the advantages of the fuzzy approaches and add the capability to handle large relational datasets by considering the continuous input stream of data as a set of data chunks that are processed sequentially. Two distinct models are proposed to aggregate the information discovered from each data chunk and produce the final partition of the dataset. Our new algorithms are compared to state-of-the-art fuzzy clustering algorithms on artificial and real datasets. Experiments show that our new approaches perform closely if not better than existing algorithms while adding the capability to handle relational data to better match the needs of real world applications.
Keywords :
fuzzy set theory; pattern clustering; continuous data streams; data chunks set; incremental fuzzy c medoids clustering algorithms; very large datasets; Aggregates; Application software; Clustering algorithms; Computer applications; Data mining; Fuzzy sets; Hardware; Internet; Partitioning algorithms; Software algorithms;
Conference_Titel :
Fuzzy Information Processing Society (NAFIPS), 2010 Annual Meeting of the North American
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-7859-0
Electronic_ISBN :
978-1-4244-7857-6
DOI :
10.1109/NAFIPS.2010.5548263