• DocumentCode
    1548756
  • Title

    Study on the Impact of Partition-Induced Dataset Shift on k -Fold Cross-Validation

  • Author

    Moreno-Torres, J.G. ; Saez, J.A. ; Herrera, Francisco

  • Author_Institution
    Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain
  • Volume
    23
  • Issue
    8
  • fYear
    2012
  • Firstpage
    1304
  • Lastpage
    1312
  • Abstract
    Cross-validation is a very commonly employed technique used to evaluate classifier performance. However, it can potentially introduce dataset shift, a harmful factor that is often not taken into account and can result in inaccurate performance estimation. This paper analyzes the prevalence and impact of partition-induced covariate shift on different k-fold cross-validation schemes. From the experimental results obtained, we conclude that the degree of partition-induced covariate shift depends on the cross-validation scheme considered. In this way, worse schemes may harm the correctness of a single-classifier performance estimation and also increase the needed number of repetitions of cross-validation to reach a stable performance estimation.
  • Keywords
    data handling; dataset shift; k-fold cross validation; partition induced dataset shift; performance estimation; Accuracy; Algorithm design and analysis; Classification algorithms; Partitioning algorithms; Reliability; Testing; Covariate shift; cross-validation; dataset shift; partitioning;
  • fLanguage
    English
  • Journal_Title
    Neural Networks and Learning Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2162-237X
  • Type

    jour

  • DOI
    10.1109/TNNLS.2012.2199516
  • Filename
    6226477