• DocumentCode
    74805
  • Title

    Density-Preserving Sampling: Robust and Efficient Alternative to Cross-Validation for Error Estimation

  • Author

    Budka, Marcin ; Gabrys, Bogdan

  • Author_Institution
    Smart Technol. Res. Centre, Bournemouth Univ., Poole, UK
  • Volume
    24
  • Issue
    1
  • fYear
    2013
  • fDate
    Jan. 2013
  • Firstpage
    22
  • Lastpage
    34
  • Abstract
    Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.
  • Keywords
    learning (artificial intelligence); pattern classification; regression analysis; sampling methods; DPS procedure; classification model; correntropy-inspired density-preserving sampling procedure; cross-validation; density-preserving sampling; generalization error estimation procedures; low-variance error estimates; machine learning; regression model; Accuracy; Computational modeling; Error analysis; Joints; Kernel; Standards; Training; Bootstrap; correntropy; cross-validation; error estimation; model selection; sampling;
  • fLanguage
    English
  • Journal_Title
    Neural Networks and Learning Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2162-237X
  • Type

    jour

  • DOI
    10.1109/TNNLS.2012.2222925
  • Filename
    6360017