Title :
Relationship between the accuracy of classifier error estimation and distribution complexity
Author :
Atashpaz-Gargari, Esmaeil ; Sima, Chao ; Braga-Neto, Ulisses M. ; Dougherty, Edward R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
Abstract :
Error estimation is a crucial part of any classification problem and it becomes problematic with small samples. In this paper, we analyze the performance of some widely used error estimation methods relative to the complexity of the feature-label distribution: resubstitution, 10-fold cross validation with repetition (CV10r), leave-one-out (LOO), bootstrap .632, and bolstered resubstitution. Our definition of complexity takes into account both the complexity of the Bayes decision surface and the Bayes error. We define the complexity of distribution for a class of Gaussian mixture models. In this class, the Bayes classifier is a piecewise linear classifier and its complexity is included in our definition. Based on the defined measure of complexity, we perform experiments for 2-dimensional and 3-dimensional problems and apply different error estimation methods for distributions of different complexities. The Bias and root-mean-squared (RMS) error of the error estimators are used to analyze their performances. The simulation results show that all the estimation methods lose accuracy as the complexity increases and this performance loss is quantified as a function of distribution complexity.
Keywords :
Bayes methods; Gaussian processes; computational complexity; mean square error methods; pattern classification; 10-fold cross validation with repetition; 2-dimensional dimensional problems; 3-dimensional problems; Bayes classifier; Bayes decision surface; Bayes error; Gaussian mixture models; bias error; bolstered resubstitution; classification problem; classifier error estimation accuracy; feature-label distribution complexity; leave-one-out bootstrap .632; piecewise linear classifier; root-mean-squared error; Bioinformatics; Complexity theory; Error analysis; Genomics; Hidden Markov models; Measurement uncertainty; Three dimensional displays;
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on
Conference_Location :
San Antonio, TX
Print_ISBN :
978-1-4673-0491-7
Electronic_ISBN :
2150-3001
DOI :
10.1109/GENSiPS.2011.6169466