Title :
The research of decision tree mining based on Hadoop
Author :
Lu, Qiu ; Cheng, Xiao-hui
Author_Institution :
Sch. of Inf. Sci. & Eng., Guilin Univ. of Technol., Guilin, China
Abstract :
For a single node massive data, the mining calculation of the decision-tree is very large. In order to solve this problem, this paper proposes the HF_SPRINT parallel algorithm that bases on the Hadoop platform. The parallel algorithm optimizes and improves the SPRINT algorithm as well as realizes the parallelization. The experimental results show that this algorithm has high acceleration ratio and good scalability.
Keywords :
data mining; decision trees; parallel algorithms; public domain software; HF_SPRINT parallel algorithm; Hadoop platform; data mining categorization; data mining technologies; decision tree mining calculation; distributed software framework; Acceleration; Algorithm design and analysis; Classification algorithms; Data mining; Educational institutions; Indexes; Parallel processing; Hadoop; MapReduce; SPRINT;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6234264