• DocumentCode
    3195802
  • Title

    Rule based regression and feature selection for biological data

  • Author

    Liu, Siyuan ; Dissanayake, Shamitha ; Patel, Surabhi ; Dang, Xin ; Mlsna, Todd ; Chen, Yuanfeng ; Wilkins, Dawn

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Mississippi, Oxford, MS, USA
  • fYear
    2013
  • fDate
    18-21 Dec. 2013
  • Firstpage
    446
  • Lastpage
    451
  • Abstract
    Regression is widely utilized in a variety of biological problems involving continuous outcomes. There are a number of methods for building regression models ranging from linear models to more complex nonlinear ones. While linear regression techniques can identify linear correlations between input and output, in many practical applications, the relations are nonlinear. These relations can be modeled by nonlinear regression techniques effectively. However, many models built with nonlinear techniques have limited interpretation, which is crucial in many biological problems. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features, and hence is able to provide a simple interpretation. We tested the approach on a seacoast chemical sensors dataset, a Stockori flowering time dataset, and three datasets from the UCI repository. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of conventional random forests regression. It demonstrates high potential in terms of prediction performance and interpretation ease on studying nonlinear relationships of the subjects.
  • Keywords
    biology computing; chemical sensors; feature selection; random processes; regression analysis; 1-norm regularized random forests; Stockori flowering time dataset; UCI repository; biological data; biological problems; complex nonlinear models; feature selection; generated random forests; linear correlations; nonlinear regression techniques; rule-based regression; seacoast chemical sensor dataset; Computed tomography; Feature extraction; Prediction algorithms; Predictive models; Radio frequency; Support vector machines; Vegetation; Rule based regression; feature selection; random forests; rule extraction; stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/BIBM.2013.6732533
  • Filename
    6732533