DocumentCode :
1137065
Title :
An Efficient Algorithm for Generating Generalized Decision Forests
Author :
Zhao, Huimin ; Sinha, Atish P.
Author_Institution :
Sch. of Bus. Adm., Univ. of Wisconsin-Milwaukee, Milwaukee, WI, USA
Volume :
35
Issue :
5
fYear :
2005
Firstpage :
754
Lastpage :
762
Abstract :
A shortcoming of univariate decision tree learners is that they do not learn intermediate concepts and select only one of the input features in the branching decision at each intermediate tree node. It has been empirically demonstrated that cascading other classification methods, which learn intermediate concepts, with decision tree learners can alleviate such representational bias of decision trees and potentially improve classification performance. However, a more complex model that fits training data better may not necessarily perform better on unseen data, commonly referred to as the overfitting problem. To find the most appropriate degree of such cascade generalization, a decision forest (i.e., a set of decision trees with other classification models cascaded to different degrees) needs to be generated, from which the best decision tree can then be identified. In this paper, the authors propose an efficient algorithm for generating such decision forests. The algorithm uses an extended decision tree data structure and constructs any node that is common to multiple decision trees only once. The authors have empirically evaluated the algorithm using 32 data sets for classification problems from the University of California, Irvine (UCI) machine learning repository and report on results demonstrating the efficiency of the algorithm in this paper.
Keywords :
data mining; data structures; decision trees; cascade generalization; data mining; data structure; generalized decision forest; univariate decision tree learners; Classification tree analysis; Data mining; Decision making; Decision trees; Machine learning; Machine learning algorithms; Predictive models; Supervised learning; Training data; Tree data structures; Cascade generalization; classification; data mining; decision forest; decision tree; machine learning;
fLanguage :
English
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4427
Type :
jour
DOI :
10.1109/TSMCA.2005.843392
Filename :
1495617
Link To Document :
بازگشت