DocumentCode :
1761033
Title :
Consistent Procedures for Cluster Tree Estimation and Pruning
Author :
Chaudhuri, Kamalika ; Dasgupta, S. ; Kpotufe, Samory ; von Luxburg, Ulrike
Author_Institution :
Univ. of California at San Diego, La Jolla, CA, USA
Volume :
60
Issue :
12
fYear :
2014
fDate :
Dec. 2014
Firstpage :
7900
Lastpage :
7912
Abstract :
For a density f on Rd, a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters forms a hierarchy called the cluster tree of f . We present two procedures for estimating the cluster tree given samples from f . The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the k-nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms, which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.
Keywords :
computational complexity; estimation theory; pattern clustering; trees (mathematics); cluster tree estimation; cluster tree pruning; finite-sample convergence rates; hierarchical clustering; high-density cluster; k-nearest neighbor graph; lower bounds; sample complexity; single linkage algorithm; Algorithm design and analysis; Clustering algorithms; Convergence; Couplings; Estimation; Level set; Robustness; Clustering algorithms; convergence;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2014.2361055
Filename :
6915900
Link To Document :
بازگشت