• DocumentCode
    840964
  • Title

    A Comparison of Decision Tree Ensemble Creation Techniques

  • Author

    Banfield, R.E. ; Hall, L.O. ; Bowyer, K.W. ; Kegelmeyer, W.P.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL
  • Volume
    29
  • Issue
    1
  • fYear
    2007
  • Firstpage
    173
  • Lastpage
    180
  • Abstract
    We experimentally evaluate bagging and seven other randomization-based approaches to creating an ensemble of decision tree classifiers. Statistical tests were performed on experimental results from 57 publicly available data sets. When cross-validation comparisons were tested for statistical significance, the best method was statistically more accurate than bagging on only eight of the 57 data sets. Alternatively, examining the average ranks of the algorithms across the group of data sets, we find that boosting, random forests, and randomized trees are statistically significantly better than bagging. Because our results suggest that using an appropriate ensemble size is important, we introduce an algorithm that decides when a sufficient number of classifiers has been created for an ensemble. Our algorithm uses the out-of-bag error estimate, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble
  • Keywords
    decision trees; learning (artificial intelligence); pattern classification; random processes; statistical analysis; bagging; decision tree classifier; decision tree ensemble creation technique; random forest; randomization-based approach; statistical test; Bagging; Boosting; Classification tree analysis; Decision trees; Performance evaluation; Sampling methods; Statistical analysis; Testing; Training data; Classifier ensembles; bagging; boosting; performance evaluation.; random forests; random subspaces; Algorithms; Artificial Intelligence; Decision Support Techniques; Information Storage and Retrieval; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2007.250609
  • Filename
    4016560