DocumentCode :
2708684
Title :
C4.5 decision forests
Author :
Ho, Tin Kam
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
Volume :
1
fYear :
1998
fDate :
16-20 Aug 1998
Firstpage :
545
Abstract :
Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. We propose a method to construct a decision tree based classifier that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. Trees are generated using the well-known C4.5 algorithm, and the classifier consists of multiple trees constructed in pseudo-randomly selected subspaces of the given feature space. We compare the method to single-tree classifiers and other forest construction methods by experiments on four public data sets, where the method´s superiority is demonstrated. A measure is given to describe the similarity between trees in a forest, and is related to the combined classification accuracy
Keywords :
decision trees; pattern classification; C4.5 decision forests; combined classification accuracy; decision tree based classifier; generalization accuracy; maximum accuracy; overfitting; pseudo-randomly selected subspaces; Australia; Boosting; Computer vision; Conferences; Decision trees; Learning systems; Mathematics; Nearest neighbor searches; Pattern recognition; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on
Conference_Location :
Brisbane, Qld.
ISSN :
1051-4651
Print_ISBN :
0-8186-8512-3
Type :
conf
DOI :
10.1109/ICPR.1998.711201
Filename :
711201
Link To Document :
بازگشت