• DocumentCode
    1846152
  • Title

    Improved Bolstering Error Estimation for Gene Ranking

  • Author

    Huynh, K.N.T. ; Phan, J.H. ; Vo, T.M. ; Wang, M.D.

  • Author_Institution
    Georgia Inst. of Technol. & Emory Univ., Atlanta
  • fYear
    2007
  • fDate
    22-26 Aug. 2007
  • Firstpage
    4633
  • Lastpage
    4636
  • Abstract
    Many methods have been proposed to identify differentially expressed genes in diseased tissues. The performance of the method is closely related to the evaluation metric. We examine several error estimation algorithms (i.e., cross validation, bootstrap, resubstitution, and resubstitution with bolstering) for three classifiers (i.e., support vector machine, Fisher´s discriminant, and signed distance function). To control the classifier´s data-overfitting problem, usually caused by small sample size for many real datasets, we generate synthetic datasets based on real data. This way, we can monitor sample size impact when evaluating the metrics. We find that resubstitution with bolstering has the best result, especially with respect to computational efficiency. However, classical bolstering tends to bias in high dimensions. Thus, we further investigate ways to reduce bolstering estimation bias without increasing computational intensity. Results of our investigation indicate that the estimator tends to become unbiased as the sample size increases. We also find that modified bolstering is the best among all metrics in terms of estimation accuracy and computational efficiency.
  • Keywords
    biological tissues; error compensation; genetics; support vector machines; Fisher´s discriminant; bolstering error estimation; computational efficiency; data overfitting problem; diseased tissues; error estimation algorithms; estimation accuracy; gene ranking; signed distance function; support vector machine; Computational efficiency; Diseases; Error analysis; Gene expression; Robustness; Smoothing methods; Support vector machine classification; Support vector machines; Testing; Training data; Animals; Computer Simulation; Gene Expression Profiling; Gene Expression Regulation; Humans; Oligonucleotide Array Sequence Analysis; Selection Bias; Sensitivity and Specificity; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE
  • Conference_Location
    Lyon
  • ISSN
    1557-170X
  • Print_ISBN
    978-1-4244-0787-3
  • Type

    conf

  • DOI
    10.1109/IEMBS.2007.4353372
  • Filename
    4353372