DocumentCode
3099876
Title
On optimal data split for generalization estimation and model selection
Author
Larsen, Jan ; Goutte, Cyril
Author_Institution
Dept. of Math. Modeling, Tech. Univ. Denmark, Lyngby, Denmark
fYear
1999
fDate
36373
Firstpage
225
Lastpage
234
Abstract
The paper is concerned with studying the very different behavior of the two data splits using hold-out cross-validation, K-fold cross-validation and randomized permutation cross-validation. First we describe the theoretical basics of various cross-validation techniques with the purpose of reliably estimating the generalization error and optimizing the model structure. The paper deals with the simple problem of estimating a single location parameter. This problem is tractable as non-asymptotic theoretical analysis is possible, whereas mainly asymptotic analysis and simulation studies are viable for the more complex AR-models and neural networks
Keywords
autoregressive processes; estimation theory; generalisation (artificial intelligence); learning (artificial intelligence); modelling; neural nets; parameter estimation; probability; K-fold cross-validation; generalization error; generalization estimation; hold-out cross-validation; model selection; model structure; nonasymptotic theoretical analysis; optimal data split; randomized permutation cross-validation; Cost function; Design optimization; Electronic mail; Mathematical model; Neural networks; Predictive models; Reliability theory; Robustness; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop.
Conference_Location
Madison, WI
Print_ISBN
0-7803-5673-X
Type
conf
DOI
10.1109/NNSP.1999.788141
Filename
788141
Link To Document