Title :
Imbalanced classification using genetically optimized cost sensitive classifiers
Author :
Perry, Todd ; Bader-El-Den, Mohamed ; Cooper, Steven
Author_Institution :
School of Computing, University of Portsmouth, Portsmouth PO1 3HE, UK
Abstract :
Classification is one of the most researched problems in machine learning, since the 1960s a myriad of different techniques have been proposed. The purpose of a classification algorithm, also known as a ‘classifier’, is to identify what class, or category an observation belongs to. In many real-world scenarios, datasets tend to suffer from class imbalance, where the number of observations belonging to one class greatly outnumbers that of the observations belonging to other classes. Class imbalance has been shown to hinder the performance of classifiers, and several techniques have been developed to improve the performance of imbalanced classifiers. Using a cost matrix is one such technique for dealing with class imbalance, however it requires a matrix to be either pre-defined, or manually optimized. This paper proposes an approach for automatically generating optimized cost matrices using a genetic algorithm. The genetic algorithm can generate matrices for classification problems with any number of classes, and is easy to tailor towards specific use-cases. The proposed approach is compared against unoptimized classifiers and alternative cost matrix optimization techniques using a variety of datasets. In addition to this, storage system failure prediction datasets are provided by Seagate UK, the potential of these datasets is investigated.
Keywords :
Bioinformatics; Drives; Genetic algorithms; Genomics; Sociology; Statistics;
Conference_Titel :
Evolutionary Computation (CEC), 2015 IEEE Congress on
Conference_Location :
Sendai, Japan
DOI :
10.1109/CEC.2015.7256956