• DocumentCode
    3123168
  • Title

    Classification with high-dimensional sparse samples

  • Author

    Huang, Dayu ; Meyn, Sean

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2012
  • fDate
    1-6 July 2012
  • Firstpage
    2586
  • Lastpage
    2590
  • Abstract
    The task of the binary classification problem is to determine which of two distributions has generated a length-n test sequence. The two distributions are unknown; two training sequences of length N, one from each distribution, are observed. The distributions share an alphabet of size m, which is significantly larger than n and N. How does N,n,m affect the probability of classification error? We characterize the achievable error rate in a high-dimensional setting in which N,n,m all tend to infinity, under the assumption that probability of any symbol is O(m-1). The results are: 1) There exists an asymptotically consistent classifier if and only if m = o(min{N2, Nn}). This extends the previous consistency result in [1] to the case N ≠ n. 2) For the sparse sample case where max{n, N} = o(m), finer results are obtained: The best achievable probability of error decays as - log(Pe) = J min {N2, Nn}(1 +o(1))/m with J >; 0. 3) A weighted coincidence-based classifier has non-zero generalized error exponent J. 4) The ℓ2-norm based classifier has J = 0.
  • Keywords
    error statistics; signal classification; signal sampling; ℓ2-norm based classifier; asymptotically consistent classifier; binary classification; classification error; error decays; generalized error exponent; high dimensional sparse samples; length-n test sequence; training sequences; Approximation methods; Information theory; Random variables; Testing; Training; Vectors; Vocabulary; classification; generalized error exponent; high-dimensional model; large deviations; sparse sample;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
  • Conference_Location
    Cambridge, MA
  • ISSN
    2157-8095
  • Print_ISBN
    978-1-4673-2580-6
  • Electronic_ISBN
    2157-8095
  • Type

    conf

  • DOI
    10.1109/ISIT.2012.6283985
  • Filename
    6283985