DocumentCode
245658
Title
An Improvement to Feature Selection of Random Forests on Spark
Author
Ke Sun ; Wansheng Miao ; Xin Zhang ; Ruonan Rao
Author_Institution
Sch. of Software, Shanghai JiaoTong Univ., Shanghai, China
fYear
2014
fDate
19-21 Dec. 2014
Firstpage
774
Lastpage
779
Abstract
The Random Forests algorithm belongs to the class of ensemble learning methods, which are common used in classification problem. In this paper, we studied the problem of adopting the Random Forests algorithm to learn raw data from real usage scenario. An improvement, which is stable, strict, high efficient, data-driven, problem independent and has no impact on algorithm performance, is proposed to investigate 2 actual issues of feature selection of the Random Forests algorithm. The first one is to eliminate noisy features, which are irrelevant to the classification. And the second one is to eliminate redundant features, which are highly relevant with other features, but useless. We implemented our improvement approach on Spark. Experiments are performed to evaluate our improvement and the results show that our approach has an ideal performance.
Keywords
feature selection; learning (artificial intelligence); pattern classification; Spark; classification problem; ensemble learning methods; feature selection; noisy feature elimination; random forest algorithm; redundant feature elimination; Accuracy; Classification algorithms; Decision trees; Noise measurement; Radio frequency; Sparks; Vegetation; Random Forests; Spark; feature importance; feature selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location
Chengdu
Print_ISBN
978-1-4799-7980-6
Type
conf
DOI
10.1109/CSE.2014.159
Filename
7023669
Link To Document