Title of article
Intrinsic dimension identification via graph-theoretic methods
Author/Authors
Brito، نويسنده , , M.R. and Quiroz، نويسنده , , A.J. and Yukich، نويسنده , , J.E.، نويسنده ,
Issue Information
دوفصلنامه با شماره پیاپی سال 2013
Pages
15
From page
263
To page
277
Abstract
Three graph theoretical statistics are considered for the problem of estimating the intrinsic dimension of a data set. The first is the “reach” statistic, r ¯ j , k , proposed in Brito et al. (2002) [4] for the problem of identification of Euclidean dimension. The second, M n , is the sample average of squared degrees in the minimum spanning tree of the data, while the third statistic, U n k , is based on counting the number of common neighbors among the k -nearest, for each pair of sample points { X i , X j } , i < j ≤ n . For the first and third of these statistics, central limit theorems are proved under general assumptions, for data living in an m -dimensional C 1 submanifold of R d , and in this setting, we establish the consistency of intrinsic dimension identification procedures based on r ¯ j , k and U n k . For M n , asymptotic results are provided whenever data live in an affine subspace of Euclidean space. The graph theoretical methods proposed are compared, via simulations, with a host of recently proposed nearest neighbor alternatives.
Keywords
intrinsic dimension , Graph theoretical methods , Dimensionality reduction , Stabilization methods
Journal title
Journal of Multivariate Analysis
Serial Year
2013
Journal title
Journal of Multivariate Analysis
Record number
1566208
Link To Document