• DocumentCode
    62625
  • Title

    Decision Trees for Mining Data Streams Based on the Gaussian Approximation

  • Author

    Rutkowski, Leszek ; Jaworski, M. ; Pietruczuk, Lena ; Duda, Piotr

  • Author_Institution
    Inst. of Comput. Intell., Czestochowa Univ. of Technol., Czestochowa, Poland
  • Volume
    26
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    108
  • Lastpage
    119
  • Abstract
    Since the Hoeffding tree algorithm was proposed in the literature, decision trees became one of the most popular tools for mining data streams. The key point of constructing the decision tree is to determine the best attribute to split the considered node. Several methods to solve this problem were presented so far. However, they are either wrongly mathematically justified (e.g., in the Hoeffding tree algorithm) or time-consuming (e.g., in the McDiarmid tree algorithm). In this paper, we propose a new method which significantly outperforms the McDiarmid tree algorithm and has a solid mathematical basis. Our method ensures, with a high probability set by the user, that the best attribute chosen in the considered node using a finite data sample is the same as it would be in the case of the whole data stream.
  • Keywords
    Gaussian processes; data mining; decision trees; probability; Gaussian approximation; Hoeffding tree algorithm; McDiarmid tree algorithm; decision trees; finite data sample; mining data streams; probability set; solid mathematical basis; Data mining; Decision trees; Entropy; Impurities; Indexes; Random variables; Training; Data steam; Gaussian approximation; decision trees; information gain;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.34
  • Filename
    6466324