Title :
Rule based regression and feature selection for biological data
Author :
Liu, Siyuan ; Dissanayake, Shamitha ; Patel, Surabhi ; Dang, Xin ; Mlsna, Todd ; Chen, Yuanfeng ; Wilkins, Dawn
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Mississippi, Oxford, MS, USA
Abstract :
Regression is widely utilized in a variety of biological problems involving continuous outcomes. There are a number of methods for building regression models ranging from linear models to more complex nonlinear ones. While linear regression techniques can identify linear correlations between input and output, in many practical applications, the relations are nonlinear. These relations can be modeled by nonlinear regression techniques effectively. However, many models built with nonlinear techniques have limited interpretation, which is crucial in many biological problems. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features, and hence is able to provide a simple interpretation. We tested the approach on a seacoast chemical sensors dataset, a Stockori flowering time dataset, and three datasets from the UCI repository. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of conventional random forests regression. It demonstrates high potential in terms of prediction performance and interpretation ease on studying nonlinear relationships of the subjects.
Keywords :
biology computing; chemical sensors; feature selection; random processes; regression analysis; 1-norm regularized random forests; Stockori flowering time dataset; UCI repository; biological data; biological problems; complex nonlinear models; feature selection; generated random forests; linear correlations; nonlinear regression techniques; rule-based regression; seacoast chemical sensor dataset; Computed tomography; Feature extraction; Prediction algorithms; Predictive models; Radio frequency; Support vector machines; Vegetation; Rule based regression; feature selection; random forests; rule extraction; stability;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732533