Title of article
Ascertainment of the number of samples in the validation set in Monte Carlo cross validation and the selection of model dimension with Monte Carlo cross validation
Author/Authors
Du، نويسنده , , Yi Ping and Kasemsumran، نويسنده , , Sumaporn and Maruo، نويسنده , , Katsuhiko and Nakagawa، نويسنده , , Takehiro and Ozaki، نويسنده , , Yukihiro، نويسنده ,
Issue Information
دوفصلنامه با شماره پیاپی سال 2006
Pages
7
From page
83
To page
89
Abstract
Monte Carlo cross validation (MCCV) is used in two data sets including 125 and 1643 near-infrared (NIR) spectra of biological samples, respectively, to ascertain the number of samples left out for validation in MCCV and the dimension of PLS models consequently. With the selected number of samples in validation set, the suitable number of latent variables (LV) may be chosen correctly. The results obtained show that root mean squared error of calibration (RMSEC), root mean squared error of cross validation (RMSECV) and LV number are sensitive to the number of samples left out for validation when too many samples are left out. Based on this, RMSEC and RMSECV are suggested as criteria to assist the ascertainment of the number of samples left out for validation in MCCV. This method is easy and convenient to use. For a larger data set, more samples may be left out, but the suitable number of samples left out will decrease if the measurement error level is high.
Keywords
Leave-one-out cross validation , Cross Validation , partial least squares , Near-infrared spectra , Monte Carlo cross validation
Journal title
Chemometrics and Intelligent Laboratory Systems
Serial Year
2006
Journal title
Chemometrics and Intelligent Laboratory Systems
Record number
1461622
Link To Document