Title :
Bias of error rates in linear discriminant analysis caused by feature selection and sample size
Author :
Schulerud, Helene
Author_Institution :
Dept. of Inf., Oslo Univ., Norway
Abstract :
The holdout and leave-one-out error estimates for a two-class problem with multivariate normal distributions and common covariance are derived as a function of the number of feature candidates, classifier dimensionality, sample size and Mahalanobis distance, using Monte Carlo simulations. It is demonstrated that the leave-one-out error rate is a highly biased estimate of the true error if feature selection is performed on the same data before error estimation. This problem is especially pronounced when analyzing many features on a small data set. The holdout error is an almost unbiased estimate of the true error independent of the number of feature candidates
Keywords :
error statistics; estimation theory; feature extraction; normal distribution; pattern classification; Mahalanobis distance; dimensionality; error estimation; error rate bias; feature extraction; holdout error; leave-one-out error; linear discriminant analysis; normal distributions; Error analysis; Gaussian distribution; Hospitals; Informatics; Linear discriminant analysis; Pathology; Pattern recognition; Standards development; Testing; Training data;
Conference_Titel :
Pattern Recognition, 2000. Proceedings. 15th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
0-7695-0750-6
DOI :
10.1109/ICPR.2000.906090