• DocumentCode
    1283095
  • Title

    Combined Rule Extraction and Feature Elimination in Supervised Classification

  • Author

    Sheng Liu ; Patel, R.Y. ; Daga, P.R. ; Haining Liu ; Gang Fu ; Doerksen, R.J. ; Yixin Chen ; Wilkins, D.E.

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Mississippi, Oxford, MS, USA
  • Volume
    11
  • Issue
    3
  • fYear
    2012
  • Firstpage
    228
  • Lastpage
    236
  • Abstract
    There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
  • Keywords
    biology computing; decision trees; drug delivery systems; feature extraction; knowledge based systems; pattern classification; 1-norm regularized random forests; biology related research problems; combined rule extraction; drug activity prediction; feature elimination; feature selection; microarray data sets; predictive model; supervised classification; Accuracy; Decision trees; Encoding; Feature extraction; Prediction algorithms; Radio frequency; Support vector machines; Rule extraction; feature selection; multi-class classification; random forests; Algorithms; Artificial Intelligence; Computational Biology; Databases, Factual; Decision Trees; Humans; Models, Theoretical; Neoplasms; Oligonucleotide Array Sequence Analysis; P-Glycoprotein; Receptors, Cannabinoid; Reproducibility of Results;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2012.2213264
  • Filename
    6298044