Title :
A parallel algorithm to induce decision trees for large datasets
Author :
Franco-Arcega, A. ; Suarez-Cansino, J. ; Flores-Flores, L.G.
Author_Institution :
Inf. & Syst. Technol. Res. Center, Autonomous Univ. of the State of Hidalgo, Hidalgo, Mexico
fDate :
Oct. 30 2013-Nov. 1 2013
Abstract :
This paper introduces a new parallel algorithm called ParDTLT and discusses some of its advantages with respect to a set of well known sequential and parallel algorithms. The parallel process occurs in every node in the decision tree, which is constructed during the supervised training phase. The basis of the distribution of a parallel task is on the attributes of the training objects and the growing of the tree is based on two criteria, who are defined by the maximum number of training objects that every node can support and an entropic gain ratio criterion. Different experiments have been made to compare the behavior of the parallel algorithm ParDTLT with the behavior of the sequential algorithms C4.5, VFDT, YaDT and DTLT and with the parallel algorithm called Synchronous. The experimental results show that ParDTLT keeps the quality of classification and it reduces the execution time.
Keywords :
database management systems; decision trees; entropy; parallel algorithms; C4.5 algorithms; DTLT algorithms; ParDTLT; Synchronous algorithm; VFDT algorithms; YaDT algorithms; decision trees; entropic gain ratio criterion; execution time; large datasets; parallel algorithm; parallel process; parallel task distribution; sequential algorithms; supervised training phase; Algorithm design and analysis; Decision trees; Parallel algorithms; Program processors; Time complexity; Training;
Conference_Titel :
Information, Communication and Automation Technologies (ICAT), 2013 XXIV International Symposium on
Conference_Location :
Sarajevo
DOI :
10.1109/ICAT.2013.6684045