DocumentCode :
3673657
Title :
Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance
Author :
Alireza Fazelpour;Taghi M. Khoshgoftaar;David J. Dittman;Ahmad Abu Shanab
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2015
Firstpage :
372
Lastpage :
379
Abstract :
Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper mechanism to manage them in order to improve the performance of classifiers and feature selection methods. In this study, our motivation is to investigate the effects of class noise on the classification performance of various learners using multiple derived datasets with varying degrees of data quality and class imbalance. Class imbalance is another challenging characteristic that occurs when one class has many more instances than the other class(es). To this end, we conducted experiments using a filter-based subset selection method applied to multiple derived datasets generated by injecting artificial class noise in a controlled manner creating three levels of data quality: High-Quality, Average-Quality, and Low-Quality. Our results along with statistical analysis show that Random Forest outperforms other learners without any exceptions for all levels of balance and data quality. Therefore, we recommend using Random Forest as the noise-tolerant and robust classifier when dealing with varying degrees of quality for bioinformatics datasets.
Keywords :
"Noise","Bioinformatics","Data models","Biological system modeling","Training","Robustness","Vegetation"
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/IRI.2015.63
Filename :
7301001
Link To Document :
بازگشت