• DocumentCode
    245658
  • Title

    An Improvement to Feature Selection of Random Forests on Spark

  • Author

    Ke Sun ; Wansheng Miao ; Xin Zhang ; Ruonan Rao

  • Author_Institution
    Sch. of Software, Shanghai JiaoTong Univ., Shanghai, China
  • fYear
    2014
  • fDate
    19-21 Dec. 2014
  • Firstpage
    774
  • Lastpage
    779
  • Abstract
    The Random Forests algorithm belongs to the class of ensemble learning methods, which are common used in classification problem. In this paper, we studied the problem of adopting the Random Forests algorithm to learn raw data from real usage scenario. An improvement, which is stable, strict, high efficient, data-driven, problem independent and has no impact on algorithm performance, is proposed to investigate 2 actual issues of feature selection of the Random Forests algorithm. The first one is to eliminate noisy features, which are irrelevant to the classification. And the second one is to eliminate redundant features, which are highly relevant with other features, but useless. We implemented our improvement approach on Spark. Experiments are performed to evaluate our improvement and the results show that our approach has an ideal performance.
  • Keywords
    feature selection; learning (artificial intelligence); pattern classification; Spark; classification problem; ensemble learning methods; feature selection; noisy feature elimination; random forest algorithm; redundant feature elimination; Accuracy; Classification algorithms; Decision trees; Noise measurement; Radio frequency; Sparks; Vegetation; Random Forests; Spark; feature importance; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4799-7980-6
  • Type

    conf

  • DOI
    10.1109/CSE.2014.159
  • Filename
    7023669