• DocumentCode
    14718
  • Title

    Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams

  • Author

    Masud, M.M. ; Qing Chen ; Khan, Latifur ; Aggarwal, Charu C. ; Jing Gao ; Jiawei Han ; Srivastava, Anurag ; Oza, N.C.

  • Author_Institution
    Fac. of Inf. Technol., United Arab Emirates Univ., Al-Ain, United Arab Emirates
  • Volume
    25
  • Issue
    7
  • fYear
    2013
  • fDate
    Jul-13
  • Firstpage
    1484
  • Lastpage
    1497
  • Abstract
    Data stream classification poses many challenges to the data mining community. In this paper, we address four such major challenges, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Concept-drift is a common phenomenon in data streams, which occurs as a result of changes in the underlying concepts. Concept-evolution occurs as a result of new classes evolving in the stream. Feature-evolution is a frequently occurring process in many streams, such as text streams, in which new features (i.e., words or phrases) appear as the stream progresses. Most existing data stream classification techniques address only the first two challenges, and ignore the latter two. In this paper, we propose an ensemble classification framework, where each classifier is equipped with a novel class detector, to address concept-drift and concept-evolution. To address feature-evolution, we propose a feature set homogenization technique. We also enhance the novel class detection module by making it more adaptive to the evolving stream, and enabling it to detect more than one novel class at a time. Comparison with state-of-the-art data stream classification techniques establishes the effectiveness of the proposed approach.
  • Keywords
    data mining; media streaming; text analysis; adaptive class detection; concept-drift; concept-evolution; data mining community; data stream classification techniques; ensemble classification framework; feature set homogenization technique; feature-evolution; historical data; infinite length; text streams; Data engineering; Data models; Feature extraction; Heuristic algorithms; Knowledge engineering; Training; Vocabulary; Data stream; concept-evolution; novel class; outlier;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.109
  • Filename
    6205751