DocumentCode :
3114973
Title :
A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems
Author :
Batuwita, Rukshan ; Palade, Vasile
Author_Institution :
Comput. Lab., Univ. of Oxford, Oxford, UK
fYear :
2009
fDate :
13-15 Dec. 2009
Firstpage :
545
Lastpage :
550
Abstract :
In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher specificity (SP) and a lower sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called adjusted geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.
Keywords :
bioinformatics; classification; learning (artificial intelligence); adjusted geometric-mean; bioinformatics; class imbalance learning; imbalanced dataset learning; suboptimal classification models; Bioinformatics; Costs; Data processing; Electronic mail; Laboratories; Learning systems; Machine learning; Performance analysis; Predictive models; Proteins; Bioinformatics; Class Imbalance Learning; Model Selection; Performance Measures; SVMs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
978-0-7695-3926-3
Type :
conf
DOI :
10.1109/ICMLA.2009.126
Filename :
5381421
Link To Document :
بازگشت