Title :
Online Active Learning with Imbalanced Classes
Author :
Ferdowsi, Zahra ; Ghani, Rayid ; Settimi, Raffaella
Author_Institution :
Sch. of Comput., DePaul Univ., Chicago, IL, USA
Abstract :
This paper proposes an online algorithm for active learning that switches between different candidate instance selection strategies (ISS) for classification in imbalanced data sets. This is important for two reasons: 1) many real-world problems have imbalanced class distributions and 2) there is no ISS that always outperforms all the other techniques. We first empirically compare the performance of existing techniques on imbalanced data sets and show that different strategies work better on different data sets and some techniques even hurt compared to random selection. We then propose an unsupervised score to track and predict the performance of individual instance selection techniques, allowing us to select an effective technique without using a holdout set and wasting valuable labeled data. This score is used in a simple online learning approach that switches between different ISS at each iteration. The proposed approach performs better than the best individual strategy available to the online algorithm over data sets in this paper and provides a way to build practical and effective active learning system for imbalanced data sets.
Keywords :
data analysis; learning (artificial intelligence); ISS; imbalanced data sets classification; instance selection strategies; online active learning; random selection; unsupervised score; Correlation; Educational institutions; Learning systems; Measurement; Prediction algorithms; Training; Uncertainty;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.12