• DocumentCode
    589244
  • Title

    Interaction Trees: Optimizing Ensembles of Decision Trees for Gene-Gene Interaction Detections

  • Author

    Assareh, A. ; Volkert, L.G. ; Jing Li

  • Author_Institution
    CS Dept., Kent State Univ., Kent, OH, USA
  • Volume
    1
  • fYear
    2012
  • fDate
    12-15 Dec. 2012
  • Firstpage
    616
  • Lastpage
    621
  • Abstract
    One of the main goals of genome wide association studies (GWAS) has been detecting gene-gene interactions, also known as epistasis in a broad sense, underlying complex diseases. However, high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions make most of existing interaction detection strategies practically inapplicable. Because they are capable of capturing interactions among input variables in addition to the nonlinear effects, decision trees and their ensembles have been recently shown to be resourceful strategies in detecting interactions in GWAS data. However, unlike other nodes, selection of root nodes for decision trees is merely based on marginal effects of candidate variables over the training data, which can greatly limit their epistasis detection performance, especially when disease genotypes have low marginal effects. In this study, we show that modifying the selection criterion of the root node of each new tree joining the ensemble, in a way that captures the interaction with the best variable ranked by the ensemble at the time, leads to a higher power in epistasis detection by decision tree ensembles. We demonstrate the efficacy of this idea using the three most popular decision tree ensemble algorithms: Bagging, Random Forest and Adaboost. Our simulation studies using five two-locus epistasis models with low marginal effects show a considerable enhancement of interaction detection power of all mentioned ensemble strategies after applying the proposed modification.
  • Keywords
    bioinformatics; data analysis; decision trees; diseases; genetic engineering; genomics; learning (artificial intelligence); search problems; Adaboost; GWAS data; bagging; decision tree ensemble algorithm; disease genotype; epistasis detection performance; exponential complexity; five two-locus epistasis model; gene-gene interaction detection; genome wide association study; genotype data; interaction trees; nonlinear effect; random forest; root node selection; search space; training data; Bagging; Boosting; Decision trees; Diseases; Training; Vegetation; decision trees; ensemble learning; epistasis models; interaction detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2012 11th International Conference on
  • Conference_Location
    Boca Raton, FL
  • Print_ISBN
    978-1-4673-4651-1
  • Type

    conf

  • DOI
    10.1109/ICMLA.2012.114
  • Filename
    6406635