Title :
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
Author :
Whalen, Sean ; Pandey, G.K.
Author_Institution :
Dept. of Genetics & Genomic Sci., Icahn Inst. for Genomics & Multiscale Biol., New York, NY, USA
Abstract :
The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.
Keywords :
bioinformatics; genomics; pattern classification; proteins; cluster-based meta-learning; comparative analysis; ensemble classifier; ensemble diversity; ensemble methods; ensemble performance; ensemble selection; genetic interaction prediction; genomics; heterogeneous classifiers; meta-learning; protein functions; simple aggregation; Accuracy; Bioinformatics; Diversity reception; Genomics; Proteins; Stacking; Bioinformatics; Ensemble methods; Ensemble selection; Genomics; Stacking; Supervised learning;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.21