• DocumentCode
    3126552
  • Title

    Detecting Recurring and Novel Classes in Concept-Drifting Data Streams

  • Author

    Masud, Mohammad M. ; Al-Khateeb, Tahseen M. ; Khan, Latifur ; Aggarwal, Charu ; Gao, Jing ; Han, Jiawei ; Thuraisingham, Bhavani

  • Author_Institution
    Dept. of Comp. Sci., Univ. of Texas at Dallas, Dallas, TX, USA
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    1176
  • Lastpage
    1181
  • Abstract
    Concept-evolution is one of the major challenges in data stream classification, which occurs when a new class evolves in the stream. This problem remains unaddressed by most state-of-the-art techniques. A recurring class is a special case of concept-evolution. This special case takes place when a class appears in the stream, then disappears for a long time, and again appears. Existing data stream classification techniques that address the concept-evolution problem, wrongly detect the recurring classes as novel class. This creates two main problems. First, much resource is wasted in detecting a recurring class as novel class, because novel class detection is much more computationally- and memory-intensive, as compared to simply recognizing an existing class. Second, when a novel class is identified, human experts are involved in collecting and labeling the instances of that class for future modeling. If a recurrent class is reported as novel class, it will be only a waste of human effort to find out whether it is really a novel class. In this paper, we address the recurring issue, and propose a more realistic novel class detection technique, which remembers a class and identifies it as "not novel" when it reappears after a long disappearance. Our approach has shown significant reduction in classification error over state-of-the-art stream classification techniques on several benchmark data streams.
  • Keywords
    data handling; pattern classification; concept-drifting data streams; concept-evolution; data stream classification techniques; recurring detection; Analytical models; Copper; Data models; Error analysis; Humans; Training; Training data; novel class; recurring class; stream classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.49
  • Filename
    6137334