• DocumentCode
    2379742
  • Title

    Application of random forest data mining method to the feature selection for female sub-health state

  • Author

    Wang, Li-Min ; Chen, Jia-Xu ; Fan, Min ; Zhao, Xin ; Cui, Hua-Ting ; Qou, Mei-jing ; Wang, Shao-xian ; Li, Xiao-hong ; Jiang, You-ming ; Zhou, Li-qian ; Peng, Xin

  • Author_Institution
    Beijing Univ. of Chinese Med., Beijing, China
  • fYear
    2010
  • fDate
    18-18 Dec. 2010
  • Firstpage
    651
  • Lastpage
    654
  • Abstract
    BACKGROUND: Sub-health state is a low-quality status between health and disease. The aim of this study was to determine which factors and/or combination of factors could be predictive of sub-health state in female as using random forest method. METHODS: Data were collected through a clinical epidemiology survey and obtained 2992 cases (2507 cases were in sub-health state and 485 cases were in health), in which the female subhealth state cases were 1285 and the female health state cases were 177, respectively. Based on association declined by mutual information, we used a classification technique called Random Forest to predict the sub-health state in female through the analysis of the clinical data. RESULTS: We´ve obtained the total OOB error rate of 20.06% , namely, the correct classification rate is 79.94%. In other words, there were 10 variables very powerful to discriminate between health state and sub-health state in female. They were the symptoms as follows, Fatigue, Myasthenia of limbs, Amnesia, Dizziness, Dysphoria, Sighing, Hypochondriac distension and pain, Constipation, Swollen sore throat and Premenstrual Distension of Breast. CONCLUSIONS: We suggest data random forest mining method for feature selection in female sub-health state; the main advantage of this method is to select important features that retaining a high predictive accuracy.
  • Keywords
    data mining; diseases; medical diagnostic computing; random processes; disease; feature selection; female health state; random forest data mining; subhealth state; data mining; female; random forest; sub-health state;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
  • Conference_Location
    Hong, Kong
  • Print_ISBN
    978-1-4244-8303-7
  • Electronic_ISBN
    978-1-4244-8304-4
  • Type

    conf

  • DOI
    10.1109/BIBMW.2010.5703880
  • Filename
    5703880