DocumentCode
1283095
Title
Combined Rule Extraction and Feature Elimination in Supervised Classification
Author
Sheng Liu ; Patel, R.Y. ; Daga, P.R. ; Haining Liu ; Gang Fu ; Doerksen, R.J. ; Yixin Chen ; Wilkins, D.E.
Author_Institution
Dept. of Comput. & Inf. Sci., Univ. of Mississippi, Oxford, MS, USA
Volume
11
Issue
3
fYear
2012
Firstpage
228
Lastpage
236
Abstract
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
Keywords
biology computing; decision trees; drug delivery systems; feature extraction; knowledge based systems; pattern classification; 1-norm regularized random forests; biology related research problems; combined rule extraction; drug activity prediction; feature elimination; feature selection; microarray data sets; predictive model; supervised classification; Accuracy; Decision trees; Encoding; Feature extraction; Prediction algorithms; Radio frequency; Support vector machines; Rule extraction; feature selection; multi-class classification; random forests; Algorithms; Artificial Intelligence; Computational Biology; Databases, Factual; Decision Trees; Humans; Models, Theoretical; Neoplasms; Oligonucleotide Array Sequence Analysis; P-Glycoprotein; Receptors, Cannabinoid; Reproducibility of Results;
fLanguage
English
Journal_Title
NanoBioscience, IEEE Transactions on
Publisher
ieee
ISSN
1536-1241
Type
jour
DOI
10.1109/TNB.2012.2213264
Filename
6298044
Link To Document