Title :
The random subspace method for constructing decision forests
Author_Institution :
Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA
fDate :
8/1/1998 12:00:00 AM
Abstract :
Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method´s superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy
Keywords :
decision theory; learning (artificial intelligence); pattern classification; random processes; trees (mathematics); classification accuracy; decision forests; decision tree based classifier; decision trees; feature vector; generalization accuracy; maximum accuracy; overfitting; random subspace method; Binary trees; Classification tree analysis; Clustering algorithms; Decision trees; Stochastic systems; Support vector machine classification; Support vector machines; Tin; Training data;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on