Title :
Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
Author :
Arias-Castro, Ery
Author_Institution :
Dept. of Math., Univ. of California, La Jolla, CA, USA
fDate :
3/1/2011 12:00:00 AM
Abstract :
In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes, and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for some emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; hierarchical clustering with single linkage; and the spectral clustering method of Ng, Jordan, and Weiss. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first and third methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
Keywords :
graph theory; pattern clustering; spectral analysis; Euclidean ambient space; Perona method; Zelnik-Manor method; asymptotic setting; emblematic methods; hierarchical clustering; local scaling method; mixed dimensions; neighborhood graph; pairwise distances; single linkage; spectral clustering method; spectral gap; Clustering algorithms; Clustering methods; Context; Couplings; Kernel; Manifolds; Partitioning algorithms; Clustering; detection in point clouds; extracting connected components; hierarchical clustering with single linkage; manifold learning; minimax rates; nearest-neighbor search; neighborhood graphs; random geometric graphs; spectral methods;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2011.2104630