DocumentCode :
1971414
Title :
Supervised multivariate discretization in mixed data with Random Forests
Author :
Berrado, Abdelaziz ; Runger, Georger C.
Author_Institution :
Ind. Eng., EMI, Rabat
fYear :
2009
fDate :
10-13 May 2009
Firstpage :
211
Lastpage :
217
Abstract :
Discretizing continuous attributes is necessary before association rules mining or using several inductive learning algorithms with a heterogeneous data space. This data preprocessing step should be carried out with a minimum information loss; that is the mutual information between attributes on the one hand and between attributes and the class labels on the other should not be destroyed. This paper introduces a novel supervised, global and dynamic discretization algorithm, called RFDisc (Random Forests Discretizer). It derives its ability in conserving the data properties from the Random Forests learning algorithm. RFDisc is simple, relatively fast and learns automatically the number of bins into which each continuous attribute is to be discretized. Empirical results indicate that the accuracies of classification algorithms such as CART when used with several data sets are comparable before and after discretization using RFDisc. Furthermore, C5.0 achieves the highest classification accuracy with data discretized with RFDisc when compared with other well known discretization algorithms.
Keywords :
data mining; learning by example; random processes; RFDisc; association rules mining; classification algorithms; continuous attributes; data preprocessing step; dynamic discretization algorithm; inductive learning algorithms; random forests discretizer; random forests learning algorithm; supervised multivariate discretization; Association rules; Classification algorithms; Data mining; Design automation; Electromagnetic interference; Heuristic algorithms; Industrial engineering; Information entropy; Machine learning algorithms; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
Conference_Location :
Rabat
Print_ISBN :
978-1-4244-3807-5
Electronic_ISBN :
978-1-4244-3806-8
Type :
conf
DOI :
10.1109/AICCSA.2009.5069327
Filename :
5069327
Link To Document :
بازگشت