Title :
On the selection of decision trees in Random Forests
Author :
Bernard, Simon ; Heutte, Laurent ; Adam, Sébastien
Author_Institution :
Univ. of Rouen, St. Etienne du Rouvray, France
Abstract :
In this paper we present a study on the random forest (RF) family of ensemble methods. In a ldquoclassicalrdquo RF induction process a fixed number of randomized decision trees are inducted to form an ensemble. This kind of algorithm presents two main drawbacks : (i) the number of trees has to be fixed a priori (ii) the interpretability and analysis capacities offered by decision tree classifiers are lost due to the randomization principle. This kind of process in which trees are independently added to the ensemble, offers no guarantee that all those trees will cooperate effectively in the same committee. This statement rises two questions: are there any decision trees in a RF that provide the deterioration of ensemble performance? If so, is it possible to form a more accurate committee via removal of decision trees with poor performance? The answer to these questions is tackled as a classifier selection problem. We thus show that better subsets of decision trees can be obtained even using a sub-optimal classifier selection method. This proves that ldquoclassicalrdquo RF induction process, for which randomized trees are arbitrary added to the ensemble, is not the best approach to produce accurate RF classifiers. We also show the interest in designing RF by adding trees in a more dependent way than it is traditionally done in ldquoclassicalrdquo RF induction algorithms.
Keywords :
decision trees; learning (artificial intelligence); pattern classification; randomised algorithms; sampling methods; set theory; classical RF induction algorithm; classifier ensemble method; machine learning; random forest; sampling method; suboptimal randomized decision tree classifier selection problem; subset theory; Algorithm design and analysis; Bagging; Boosting; Classification tree analysis; Decision trees; Neural networks; Radio frequency; Radiofrequency identification; Sampling methods; Training data;
Conference_Titel :
Neural Networks, 2009. IJCNN 2009. International Joint Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-3548-7
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2009.5178693