DocumentCode :
2600796
Title :
Effects of partial reporting of classification results
Author :
Yousefi, Mohammadmahdi R. ; Hua, Jianping ; Sima, Chao ; Dougherty, Edward R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
fYear :
2010
fDate :
10-12 Nov. 2010
Firstpage :
1
Lastpage :
4
Abstract :
When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error estimation, resulting in highly imprecise reported performances. This suggests that the reported performance of the proposed scheme would be less correlated with and highly biased from the actual performance if only the best results are demonstrated. This paper confirms this by showing the behavior of the joint distributions of the minimum reported estimated errors and corresponding true errors as functions of the number of samples tested in a large simulation study using both modeled and real data.
Keywords :
bioinformatics; biological techniques; data handling; pattern classification; bioinformatics; classification result partial reporting effects; classification rule; classification scheme; feature selection method; gene expression microarray data set; joint distribution behavior; Bioinformatics; Correlation; Data models; Error analysis; Genomics; Joints; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2010 IEEE International Workshop on
Conference_Location :
Cold Spring Harbor, NY
ISSN :
2150-3001
Print_ISBN :
978-1-61284-791-7
Type :
conf
DOI :
10.1109/GENSIPS.2010.5719688
Filename :
5719688
Link To Document :
بازگشت