Title :
Effects of partial reporting of classification results
Author :
Yousefi, Mohammadmahdi R. ; Hua, Jianping ; Sima, Chao ; Dougherty, Edward R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
Abstract :
When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error estimation, resulting in highly imprecise reported performances. This suggests that the reported performance of the proposed scheme would be less correlated with and highly biased from the actual performance if only the best results are demonstrated. This paper confirms this by showing the behavior of the joint distributions of the minimum reported estimated errors and corresponding true errors as functions of the number of samples tested in a large simulation study using both modeled and real data.
Keywords :
bioinformatics; biological techniques; data handling; pattern classification; bioinformatics; classification result partial reporting effects; classification rule; classification scheme; feature selection method; gene expression microarray data set; joint distribution behavior; Bioinformatics; Correlation; Data models; Error analysis; Genomics; Joints; Training;
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2010 IEEE International Workshop on
Conference_Location :
Cold Spring Harbor, NY
Print_ISBN :
978-1-61284-791-7
DOI :
10.1109/GENSIPS.2010.5719688