Title :
Clustering large datasets with kernel methods
Author :
Fausser, Stefan ; Schwenker, Friedhelm
Author_Institution :
Inst. of Neural Inf. Process., Univ. of Ulm, Ulm, Germany
Abstract :
Real-life datasets are becoming larger and less linear separable. Divisive clustering methods with a computation time linear to the number of samples n can handle large data but mostly assume linear boundaries between the cluster in input space. Kernel based clustering methods are able to detect nonlinear boundaries in feature space but have a quadratic computation time O(n2). In this paper, we propose a meta-algorithm that distributes small-sized subset of the large dataset, parallelized cluster these subset and merges the resulting approximate pseudo-centre repeatedly until the whole dataset has been processed. The meta-algorithm is able to use a wide range of kernel based clustering methods. Here we integrate Kernel Fuzzy C-Means and Relational Neural Gas. We analytically show that the algorithm has a linear computation time O(n). In the experiments we empirically evaluate the performance of the method on two real-life datasets.
Keywords :
fuzzy set theory; neural nets; pattern clustering; divisive clustering methods; kernel based clustering methods; kernel fuzzy c-means; kernel methods; large datasets clustering; linear computation time; meta-algorithm; nonlinear boundaries; parallelized cluster; pseudo-centre; relational neural gas; Approximation algorithms; Clustering algorithms; Clustering methods; Equations; Kernel; Partitioning algorithms; Prototypes;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4