Rates of convergence of nearest neighbor estimation under arbitrary sampling

Author

Kulkarni, Sanjeev R. ; Posner, Steven E.

Author_Institution

Dept. of Electr. Eng., Princeton Univ., NJ, USA

Volume

41

Issue

4

fYear

1995

fDate

7/1/1995 12:00:00 AM

Firstpage

1028

Lastpage

1039

Abstract

Rates of convergence for nearest neighbor estimation are established in a general framework in terms of metric covering numbers of the underlying space. The first result is to find explicit finite sample upper bounds for the classical independent and identically distributed (i.i.d.) random sampling problem in a separable metric space setting. The convergence rate is a function of the covering numbers of the support of the distribution. For example, for bounded subsets of R ^r, the convergence rate is O(1/n²r/). The main result is to extend the problem to allow samples drawn from a completely arbitrary random process in a separable metric space and to examine the performance in terms of the individual sample sequences. The authors show that for every sequence of samples the asymptotic time-average of nearest neighbor risks equals twice the time-average of the conditional Bayes risks of the sequence. Finite sample upper bounds under arbitrary sampling are again obtained in terms of the covering numbers of the underlying space. In particular, for bounded subsets of R^r the convergence rate of the time-averaged risk is O(1/n²r/). The authors then establish a consistency result for k_n-nearest neighbor estimation under arbitrary sampling and prove a convergence rate matching established rates for i.i.d. sampling. Finally, they show how their arbitrary sampling results lead to some classical i.i.d. sampling results and in fact extend them to stationary sampling. The framework and results are quite general while the proof techniques are surprisingly elementary

Keywords

convergence; entropy; estimation theory; random processes; signal sampling; arbitrary sampling; asymptotic time-average; bounded subsets; classical independent identically distributed random sampling problem; conditional Bayes risks; convergence rate; explicit finite sample upper bounds; metric covering numbers; nearest neighbor estimation; nearest neighbor risks; sample sequences; separable metric space; separable metric space setting; sequence; stationary sampling; time-averaged risk; Convergence; Entropy; Extraterrestrial measurements; Nearest neighbor searches; Neural networks; Random processes; Random variables; Sampling methods; Student members; Upper bound;

fLanguage

English

Journal_Title

Information Theory, IEEE Transactions on

Publisher

ieee

ISSN

0018-9448

Type

jour

DOI

10.1109/18.391248

Filename

391248