• DocumentCode
    1459406
  • Title

    Active Learning From Stream Data Using Optimal Weight Classifier Ensemble

  • Author

    Zhu, Xingquan ; Zhang, Peng ; Lin, Xiaodong ; Shi, Yong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
  • Volume
    40
  • Issue
    6
  • fYear
    2010
  • Firstpage
    1607
  • Lastpage
    1621
  • Abstract
    In this paper, we propose a new research problem on active learning from data streams, where data volumes grow continuously, and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. To tackle the technical challenges raised by the dynamic nature of the stream data, i.e., increasing data volumes and evolving decision concepts, we propose a classifier-ensemble-based active learning framework that selectively labels instances from data streams to build a classifier ensemble. We argue that a classifier ensemble´s variance directly corresponds to its error rate, and reducing a classifier ensemble´s variance is equivalent to improving its prediction accuracy. Because of this, one should label instances toward the minimization of the variance of the underlying classifier ensemble. Accordingly, we introduce a minimum-variance (MV) principle to guide the instance labeling process for data streams. In addition, we derive an optimal-weight calculation method to determine the weight values for the classifier ensemble. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Experimental results on synthetic and real-world data demonstrate the performance of the proposed work in comparison with other approaches.
  • Keywords
    data handling; data mining; learning (artificial intelligence); pattern classification; classifier-ensemble-based active learning framework; data mining; data streams; minimum-variance principle; optimal weight classifier ensemble; Accuracy; Australia; Computer science; Data mining; Decision making; Error analysis; Information management; Information technology; Labeling; Predictive models; Active learning; classifier ensemble; stream data; Algorithms; Artificial Intelligence; Computer Simulation; Decision Support Techniques; Models, Theoretical; Pattern Recognition, Automated; Signal Processing, Computer-Assisted;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2010.2042445
  • Filename
    5440901