Title : 
Interaction Trees: Optimizing Ensembles of Decision Trees for Gene-Gene Interaction Detections
         
        
            Author : 
Assareh, A. ; Volkert, L.G. ; Jing Li
         
        
            Author_Institution : 
CS Dept., Kent State Univ., Kent, OH, USA
         
        
        
        
        
        
        
            Abstract : 
One of the main goals of genome wide association studies (GWAS) has been detecting gene-gene interactions, also known as epistasis in a broad sense, underlying complex diseases. However, high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions make most of existing interaction detection strategies practically inapplicable. Because they are capable of capturing interactions among input variables in addition to the nonlinear effects, decision trees and their ensembles have been recently shown to be resourceful strategies in detecting interactions in GWAS data. However, unlike other nodes, selection of root nodes for decision trees is merely based on marginal effects of candidate variables over the training data, which can greatly limit their epistasis detection performance, especially when disease genotypes have low marginal effects. In this study, we show that modifying the selection criterion of the root node of each new tree joining the ensemble, in a way that captures the interaction with the best variable ranked by the ensemble at the time, leads to a higher power in epistasis detection by decision tree ensembles. We demonstrate the efficacy of this idea using the three most popular decision tree ensemble algorithms: Bagging, Random Forest and Adaboost. Our simulation studies using five two-locus epistasis models with low marginal effects show a considerable enhancement of interaction detection power of all mentioned ensemble strategies after applying the proposed modification.
         
        
            Keywords : 
bioinformatics; data analysis; decision trees; diseases; genetic engineering; genomics; learning (artificial intelligence); search problems; Adaboost; GWAS data; bagging; decision tree ensemble algorithm; disease genotype; epistasis detection performance; exponential complexity; five two-locus epistasis model; gene-gene interaction detection; genome wide association study; genotype data; interaction trees; nonlinear effect; random forest; root node selection; search space; training data; Bagging; Boosting; Decision trees; Diseases; Training; Vegetation; decision trees; ensemble learning; epistasis models; interaction detection;
         
        
        
        
            Conference_Titel : 
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
         
        
            Conference_Location : 
Boca Raton, FL
         
        
            Print_ISBN : 
978-1-4673-4651-1
         
        
        
            DOI : 
10.1109/ICMLA.2012.114