Title of article :
Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity
Author/Authors :
Cao، نويسنده , , Dong-Sheng and Xu، نويسنده , , Qingsong and Liang، نويسنده , , Yi-Zeng and Chen، نويسنده , , Xian and Li، نويسنده , , Hong-Dong، نويسنده ,
Issue Information :
دوفصلنامه با شماره پیاپی سال 2010
Pages :
8
From page :
129
To page :
136
Abstract :
In the structure–activity relationship (SAR) study, a learning algorithm is usually faced with the problem of selecting a compact subset of descriptors related to the property of interest, while ignoring the rest. This paper presents a new method of molecular descriptor selection utilizing three commonly used decision tree (DT)-based ensemble methods coupled with a backward elimination strategy (BES). Our proposed method eliminates descriptor redundancy automatically and searches for more compact descriptor subset tailored to DT-based ensemble methods. Six real SAR datasets related to different categorical bioactivities of compounds are used to evaluate the proposed method. The results obtained in this study indicate that DT-based ensemble methods coupled with BES, especially boosting tree model, yield better classification performance for compounds related to ADMET.
Keywords :
feature selection , Bagging , Random Forest (RF) , Classification and regression tree (CART) , Ensemble Learning , Boosting
Journal title :
Chemometrics and Intelligent Laboratory Systems
Serial Year :
2010
Journal title :
Chemometrics and Intelligent Laboratory Systems
Record number :
1489846
Link To Document :
بازگشت