Title :
Analysis of a complexity-based pruning scheme for classification trees
Author :
Nobel, Andrew B.
Author_Institution :
Dept. of Stat., North Carolina Univ., Chapel Hill, NC, USA
fDate :
8/1/2002 12:00:00 AM
Abstract :
A complexity-based pruning procedure for classification trees is described, and bounds on its finite sample performance are established. The procedure selects a subtree of a (possibly random) initial tree in order to minimize a complexity penalized measure of empirical risk. The complexity assigned to a subtree is proportional to the square root of its size. Two cases are considered. In the first, the growing and pruning data sets are identical, and in the second, they are independent Using the performance bound, the Bayes risk consistency of pruned trees obtained via the procedure is established when the sequence of initial trees satisfies suitable geometric and structural constraints. The pruning method and its analysis are motivated by work on adaptive model selection using complexity regularization.
Keywords :
Bayes methods; adaptive systems; computational complexity; trees (mathematics); Bayes risk consistency; adaptive model selection; classification trees; complexity penalized measure minimization; complexity regularization; complexity-based pruning; finite sample performance bounds; geometric constraints; growing data set; performance bound; pruning data set; random initial tree; structural constraints; subtree; Classification tree analysis; Cost function; Decision trees; Diseases; Histograms; Pattern recognition; Performance evaluation; Random variables; Regression tree analysis; Testing;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2002.800482