DocumentCode :
6821
Title :
Decision Trees for Mining Data Streams Based on the McDiarmid´s Bound
Author :
Rutkowski, Leszek ; Pietruczuk, Lena ; Duda, Piotr ; Jaworski, M.
Author_Institution :
Inst. of Comput. Intell., Czestochowa Univ. of Technol., Czestochowa, Poland
Volume :
25
Issue :
6
fYear :
2013
fDate :
Jun-13
Firstpage :
1272
Lastpage :
1279
Abstract :
In mining data streams the most popular tool is the Hoeffding tree algorithm. It uses the Hoeffding´s bound to determine the smallest number of examples needed at a node to select a splitting attribute. In the literature the same Hoeffding´s bound was used for any evaluation function (heuristic measure), e.g., information gain or Gini index. In this paper, it is shown that the Hoeffding´s inequality is not appropriate to solve the underlying problem. We prove two theorems presenting the McDiarmid´s bound for both the information gain, used in ID3 algorithm, and for Gini index, used in Classification and Regression Trees (CART) algorithm. The results of the paper guarantee that a decision tree learning system, applied to data streams and based on the McDiarmid´s bound, has the property that its output is nearly identical to that of a conventional learner. The results of the paper have a great impact on the state of the art of mining data streams and various developed so far methods and algorithms should be reconsidered.
Keywords :
data mining; decision trees; pattern classification; regression analysis; CART; Gini index; Hoeffding bound; Hoeffding tree algorithm; ID3 algorithm; McDiarmid bound; classification and regression trees algorithm; data stream mining; decision tree learning system; evaluation function; Data mining; Decision trees; Entropy; Gain measurement; Indexes; Learning systems; Random variables; Data streams; Gini index; Hoeffding´s bound; McDiarmid´s bound; decision trees; information gain;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2012.66
Filename :
6171195
Link To Document :
بازگشت