• DocumentCode
    1961716
  • Title

    CMP: a fast decision tree classifier using multivariate predictions

  • Author

    Wang, Haixun ; Zaniolo, Carlo

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    449
  • Lastpage
    460
  • Abstract
    Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. We propose a technique where, by keeping histograms on attribute pairs, we achieve: a significant speed-up over traditional classifiers based on single attribute splitting; and the ability of building classifiers that use linear combinations of values from non-categorical attribute pairs as split criterion. Indeed, by keeping two-dimensional histograms, CMP can often predict the best successive split, in addition to computing the current one; therefore, CMP is normally able to grow more than one level of a decision tree for each data scan. CMP´s performance improvements are also due to techniques whereby non-categorical attributes are discretized without loss in classification accuracy; in fact, we introduce simple techniques, whereby classification errors caused by discretization at one step can then be corrected in the following step. In summary, CMP represents a unified algorithm that extends the functionality of existing classifiers and improves their performance
  • Keywords
    classification; data mining; database theory; decision trees; software performance evaluation; very large databases; CMP; attribute pairs; class histograms; data mining; fast decision tree classifier; multivariate predictions; performance improvements; single attribute splitting; Classification tree analysis; Data mining; Databases; Decision trees; Ear; Genetics; Histograms; Machine learning; Read only memory; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2000. Proceedings. 16th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0506-6
  • Type

    conf

  • DOI
    10.1109/ICDE.2000.839444
  • Filename
    839444