• DocumentCode
    56835
  • Title

    E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams

  • Author

    Peng Zhang ; Chuan Zhou ; Peng Wang ; Gao, Byron J. ; Xingquan Zhu ; Li Guo

  • Author_Institution
    Inst. of Inf. Eng., Beijing, China
  • Volume
    27
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 2015
  • Firstpage
    461
  • Lastpage
    474
  • Abstract
    Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
  • Keywords
    data mining; indexing; learning (artificial intelligence); pattern classification; E-tree indexing structure; R-tree like height-balanced structure; Web traffic stream monitoring; concept drift; data stream classification; ensemble learning model; ensemble-tree; intrusion detection; linear complexity; spam detection; spatial database; sub-linear complexity; Adaptation models; Data models; Indexing; Market research; Monitoring; Spatial databases; Stream data mining; classification; concept drifting; ensemble learning; spatial indexing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2298018
  • Filename
    6709813