• DocumentCode
    2493694
  • Title

    Correntropy-based density-preserving data sampling as an alternative to standard cross-validation

  • Author

    Budka, Marcin ; Gabrys, Bogdan

  • Author_Institution
    Sch. of Design, Eng. & Comput., Bournemouth Univ., Poole, UK
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Estimation of the generalization ability of a predictive model is an important issue, as it indicates expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures like cross-validation (CV) or bootstrap are stochastic and thus require multiple repetitions in order to produce reliable results, which can be computationally expensive if not prohibitive. The correntropy-based Density Preserving Sampling procedure (DPS) proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets, which are guaranteed to be representative of the input dataset. This allows to produce low variance error estimates with accuracy comparable to 10 times repeated cross-validation at a fraction of computations required by CV, which has been investigated using a set of publicly available benchmark datasets and standard classifiers.
  • Keywords
    data analysis; entropy; generalisation (artificial intelligence); learning (artificial intelligence); sampling methods; correntropy-based density-preserving data sampling; cross-validation; generalization ability; generalization error estimation; low variance error estimate; model selection; predictive model; Entropy; Error analysis; Estimation; Kernel; Machine learning; Mathematical model; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2010 International Joint Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-6916-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2010.5596717
  • Filename
    5596717