• DocumentCode
    3099876
  • Title

    On optimal data split for generalization estimation and model selection

  • Author

    Larsen, Jan ; Goutte, Cyril

  • Author_Institution
    Dept. of Math. Modeling, Tech. Univ. Denmark, Lyngby, Denmark
  • fYear
    1999
  • fDate
    36373
  • Firstpage
    225
  • Lastpage
    234
  • Abstract
    The paper is concerned with studying the very different behavior of the two data splits using hold-out cross-validation, K-fold cross-validation and randomized permutation cross-validation. First we describe the theoretical basics of various cross-validation techniques with the purpose of reliably estimating the generalization error and optimizing the model structure. The paper deals with the simple problem of estimating a single location parameter. This problem is tractable as non-asymptotic theoretical analysis is possible, whereas mainly asymptotic analysis and simulation studies are viable for the more complex AR-models and neural networks
  • Keywords
    autoregressive processes; estimation theory; generalisation (artificial intelligence); learning (artificial intelligence); modelling; neural nets; parameter estimation; probability; K-fold cross-validation; generalization error; generalization estimation; hold-out cross-validation; model selection; model structure; nonasymptotic theoretical analysis; optimal data split; randomized permutation cross-validation; Cost function; Design optimization; Electronic mail; Mathematical model; Neural networks; Predictive models; Reliability theory; Robustness; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop.
  • Conference_Location
    Madison, WI
  • Print_ISBN
    0-7803-5673-X
  • Type

    conf

  • DOI
    10.1109/NNSP.1999.788141
  • Filename
    788141