Title :
Decision tree design from a communication theory standpoint
Author :
Goodman, Rodney M. ; Smyth, Padhraic
Author_Institution :
Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA
fDate :
9/1/1988 12:00:00 AM
Abstract :
A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier
Keywords :
decision theory; encoding; information theory; trees (mathematics); Shannon-Fano prefix coding; communication theory; decision tree design; delta-entropy rule; rate-distortion interpretation; single nearest neighbour classifier; termination rule; top-town mutual information algorithm; Algorithm design and analysis; Classification tree analysis; Decision trees; Expert systems; Mutual information; Nearest neighbor searches; Noise robustness; Pattern recognition; Rate-distortion; Space technology;
Journal_Title :
Information Theory, IEEE Transactions on