Title :
Boosting for Learning Multiple Classes with Imbalanced Class Distribution
Author :
Sun, Yanmin ; Kamel, Mohamed S. ; Wang, Yang
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON
Abstract :
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. This learning difficulty attracts a lot of research interests. Most efforts concentrate on bi-class problems. However, bi-class is not the only scenario where the class imbalance problem prevails. Reported solutions for bi-class applications are not applicable to multi-class problems. In this paper, we develop a cost-sensitive boosting algorithm to improve the classification performance of imbalanced data involving multiple classes. One barrier of applying the cost-sensitive boosting algorithm to the imbalanced data is that the cost matrix is often unavailable for a problem domain. To solve this problem, we apply Genetic Algorithm to search the optimum cost setup of each class. Empirical tests show that the proposed cost-sensitive boosting algorithm improves the classification performances of imbalanced data sets significantly.
Keywords :
data mining; genetic algorithms; learning (artificial intelligence); pattern classification; boosting algorithm; classifier learning algorithm; cost-sensitive boosting algorithm; data classification; genetic algorithm; imbalanced class distribution; multiple classes imbalance learning; Boosting; Classification algorithms; Cost function; Data mining; Drives; Iterative algorithms; Software standards; Software systems; Sun; Testing;
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2701-7
DOI :
10.1109/ICDM.2006.29