DocumentCode :
1478465
Title :
Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints
Author :
Masud, Mohammad M. ; Gao, Jing ; Khan, Latifur ; Han, Jiawei ; Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
Volume :
23
Issue :
6
fYear :
2011
fDate :
6/1/2011 12:00:00 AM
Firstpage :
859
Lastpage :
874
Abstract :
Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel class. We address this issue and propose a data stream classification technique that integrates a novel class detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of concept-drift, when the underlying data distributions evolve in streams. In order to determine whether an instance belongs to a novel class, the classification model sometimes needs to wait for more test instances to discover similarities among those instances. A maximum allowable wait time Tc is imposed as a time constraint to classify a test instance. Furthermore, most existing stream classification approaches assume that the true label of a data point can be accessed immediately after the data point is classified. In reality, a time delay Tl is involved in obtaining the true label of a data point since manual labeling is time consuming. We show how to make fast and correct classification decisions under these constraints and apply them to real benchmark data. Comparison with state-of-the-art stream classification techniques prove the superiority of our approach.
Keywords :
pattern classification; class detection mechanism; concept-drifting data streams; data stream classification technique; maximum allowable wait time; time constraints; Classification algorithms; Clustering algorithms; Delay effects; Fault detection; Intrusion detection; Labeling; Testing; Text categorization; Time factors; USA Councils; Data streams; K-means clustering; concept-drift; ensemble classification; k-nearest neighbor classification; novel class; silhouette coefficient.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.61
Filename :
5453372
Link To Document :
بازگشت