Title :
An Improvement to Feature Selection of Random Forests on Spark
Author :
Ke Sun ; Wansheng Miao ; Xin Zhang ; Ruonan Rao
Author_Institution :
Sch. of Software, Shanghai JiaoTong Univ., Shanghai, China
Abstract :
The Random Forests algorithm belongs to the class of ensemble learning methods, which are common used in classification problem. In this paper, we studied the problem of adopting the Random Forests algorithm to learn raw data from real usage scenario. An improvement, which is stable, strict, high efficient, data-driven, problem independent and has no impact on algorithm performance, is proposed to investigate 2 actual issues of feature selection of the Random Forests algorithm. The first one is to eliminate noisy features, which are irrelevant to the classification. And the second one is to eliminate redundant features, which are highly relevant with other features, but useless. We implemented our improvement approach on Spark. Experiments are performed to evaluate our improvement and the results show that our approach has an ideal performance.
Keywords :
feature selection; learning (artificial intelligence); pattern classification; Spark; classification problem; ensemble learning methods; feature selection; noisy feature elimination; random forest algorithm; redundant feature elimination; Accuracy; Classification algorithms; Decision trees; Noise measurement; Radio frequency; Sparks; Vegetation; Random Forests; Spark; feature importance; feature selection;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.159