Title :
Utilizing concept correlations for effective imbalanced data classification
Author :
Yilin Yan ; Yang Liu ; Mei-Ling Shyu ; Min Chen
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Miami, Coral Gables, FL, USA
Abstract :
Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.
Keywords :
data mining; learning (artificial intelligence); pattern classification; biased classification; concept correlations; data mining; effective imbalanced data classification; feature descriptors; feature representations; machine learning; negative data instances; positive correlations; real-world data sets; two-phase classification framework; Classification algorithms; Correlation; Data mining; Data models; Feature extraction; Histograms; Image color analysis; Imbalanced data; classification; correlation; rare class mining; skewed data;
Conference_Titel :
Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
DOI :
10.1109/IRI.2014.7051939