Title :
On optimal data split for generalization estimation and model selection
Author :
Larsen, Jan ; Goutte, Cyril
Author_Institution :
Dept. of Math. Modeling, Tech. Univ. Denmark, Lyngby, Denmark
Abstract :
The paper is concerned with studying the very different behavior of the two data splits using hold-out cross-validation, K-fold cross-validation and randomized permutation cross-validation. First we describe the theoretical basics of various cross-validation techniques with the purpose of reliably estimating the generalization error and optimizing the model structure. The paper deals with the simple problem of estimating a single location parameter. This problem is tractable as non-asymptotic theoretical analysis is possible, whereas mainly asymptotic analysis and simulation studies are viable for the more complex AR-models and neural networks
Keywords :
autoregressive processes; estimation theory; generalisation (artificial intelligence); learning (artificial intelligence); modelling; neural nets; parameter estimation; probability; K-fold cross-validation; generalization error; generalization estimation; hold-out cross-validation; model selection; model structure; nonasymptotic theoretical analysis; optimal data split; randomized permutation cross-validation; Cost function; Design optimization; Electronic mail; Mathematical model; Neural networks; Predictive models; Reliability theory; Robustness; Testing; Training data;
Conference_Titel :
Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop.
Conference_Location :
Madison, WI
Print_ISBN :
0-7803-5673-X
DOI :
10.1109/NNSP.1999.788141