Title :
Improved Bolstering Error Estimation for Gene Ranking
Author :
Huynh, K.N.T. ; Phan, J.H. ; Vo, T.M. ; Wang, M.D.
Author_Institution :
Georgia Inst. of Technol. & Emory Univ., Atlanta
Abstract :
Many methods have been proposed to identify differentially expressed genes in diseased tissues. The performance of the method is closely related to the evaluation metric. We examine several error estimation algorithms (i.e., cross validation, bootstrap, resubstitution, and resubstitution with bolstering) for three classifiers (i.e., support vector machine, Fisher´s discriminant, and signed distance function). To control the classifier´s data-overfitting problem, usually caused by small sample size for many real datasets, we generate synthetic datasets based on real data. This way, we can monitor sample size impact when evaluating the metrics. We find that resubstitution with bolstering has the best result, especially with respect to computational efficiency. However, classical bolstering tends to bias in high dimensions. Thus, we further investigate ways to reduce bolstering estimation bias without increasing computational intensity. Results of our investigation indicate that the estimator tends to become unbiased as the sample size increases. We also find that modified bolstering is the best among all metrics in terms of estimation accuracy and computational efficiency.
Keywords :
biological tissues; error compensation; genetics; support vector machines; Fisher´s discriminant; bolstering error estimation; computational efficiency; data overfitting problem; diseased tissues; error estimation algorithms; estimation accuracy; gene ranking; signed distance function; support vector machine; Computational efficiency; Diseases; Error analysis; Gene expression; Robustness; Smoothing methods; Support vector machine classification; Support vector machines; Testing; Training data; Animals; Computer Simulation; Gene Expression Profiling; Gene Expression Regulation; Humans; Oligonucleotide Array Sequence Analysis; Selection Bias; Sensitivity and Specificity; Software;
Conference_Titel :
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE
Conference_Location :
Lyon
Print_ISBN :
978-1-4244-0787-3
DOI :
10.1109/IEMBS.2007.4353372