DocumentCode :
3085849
Title :
Toward a measure of classification complexity in gene expression signatures
Author :
Kamath, Vidya ; Yeatman, Timothy J. ; Eschrich, Steven A.
Author_Institution :
Biomedical Engineering program at the University of South Florida, Tampa, USA
fYear :
2008
fDate :
20-25 Aug. 2008
Firstpage :
5704
Lastpage :
5707
Abstract :
Gene expression signatures identify important genes that predict a specified outcome. In several notable diseases such as leukemia and breast cancer, the results have been encouraging. In these datasets, many techniques work well when discriminating particular outcomes. However, these same methods, applied to other datasets, are unable to achieve similar levels of success. Given the small sample sizes common to these studies and the large dimensionality of the data, several key issues exist when attempting to construct reliable, reproducible gene signatures. The classifiers may not be sufficient to discriminate classes, or the data itself may not be sufficient to produce effective separation. In this paper, three simple measures of classification complexity are considered to explore a limit to the predictive accuracy that can be achieved in a dataset. Two independent gene expression datasets (lung and colorectal cancer) are considered, using three different outcomes on each dataset. Four different classifiers, using the t-test for feature selection, were tested on these datasets as a representative panel of classifiers. Our results indicate that Fisher´s discriminant ratio provides a good measure of the complexity of the classification problem, with a high correlation between complexity and best classification accuracy across all problems (R2=0.78). Specifically, predicting gender is a low complexity problem as indicated both by the complexity measure and the classification results. More clinically-oriented endpoints are more complex and have lower classification accuracies. Therefore, classification complexity can be used to estimate maximum attainable accuracy for a problem reducing the need to evaluate many different classifiers.
Keywords :
Accuracy; Biomarkers; Breast cancer; Cells (biology); Diseases; Drugs; Gene expression; Lungs; Testing; Tumors; Algorithms; Artificial Intelligence; Diagnosis, Computer-Assisted; Gene Expression Profiling; Humans; Neoplasm Proteins; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results; Sample Size; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Tumor Markers, Biological;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE
Conference_Location :
Vancouver, BC
ISSN :
1557-170X
Print_ISBN :
978-1-4244-1814-5
Electronic_ISBN :
1557-170X
Type :
conf
DOI :
10.1109/IEMBS.2008.4650509
Filename :
4650509
Link To Document :
بازگشت