Author :
He, Ping ; Chen, Ling ; Xu, Xiao-Hua
Author_Institution :
Yangzhou Univ., Yangzhou
Abstract :
C4.5 is a well-known machine learning algorithm used extensively, however, its runtime performance is sacrificed for the consideration of the limited main memory at that time. We present a fast implementation of C4.5 algorithm, named FC4.5(Fast C4.5). It organizes novel data structures, uses the indirect bucket-sort combined with the bit-parallel technique, and confines the binary-search of the cutoff within the narrowest range. The combination of these techniques enables FC4.5 greatly accelerates the tree construction process of C4.5 algorithm. Experiments show that FC4.5 can build the same decision tree as C4.5 (Release 8) system and the runtime performance gain up to 5.8 times. Besides, FC4.5 also achieves a good scalability on different kinds of datasets.
Keywords :
data structures; learning (artificial intelligence); trees (mathematics); bit-parallel technique; data structure; fast C4.5 algorithm; indirect bucket-sort; machine learning algorithm; Acceleration; Computer science; Cybernetics; Data mining; Decision trees; Machine learning; Machine learning algorithms; Performance gain; Runtime; Scalability; Bit-parallel technique; C4.5; Classification; Fast; Indirect bucket-sort;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370632