• DocumentCode
    327289
  • Title

    Parallel formulations of decision-tree classification algorithms

  • Author

    Srivastava, Anurag ; Han, Eui-Hong Sam ; Singh, Vineet ; Kumar, Vipin

  • Author_Institution
    Lab. of Inf. Technol., Hitachi America Ltd., Brisbane, CA, USA
  • fYear
    1998
  • fDate
    10-14 Aug 1998
  • Firstpage
    237
  • Lastpage
    244
  • Abstract
    Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. We present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. Experimental results on an IBM SP-2 demonstrate excellent speedups and scalability
  • Keywords
    decision theory; deductive databases; knowledge acquisition; learning by example; parallel algorithms; pattern classification; tree data structures; trees (mathematics); IBM SP-2; Partitioned Tree Construction Approach; Synchronous Tree Construction Approach; classification decision tree algorithms; classification decision tree learning algorithm; data mining; decision tree classification algorithms; fraud detection; highly parallel algorithms; hybrid method; induction; inherent dynamic nature; large data sets; natural concurrency; parallel formulations; retail target marketing; Classification algorithms; Classification tree analysis; Computer science; Concurrent computing; Contracts; Data mining; Decision trees; Identity-based encryption; Information technology; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1998. Proceedings. 1998 International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    0190-3918
  • Print_ISBN
    0-8186-8650-2
  • Type

    conf

  • DOI
    10.1109/ICPP.1998.708491
  • Filename
    708491