DocumentCode
589244
Title
Interaction Trees: Optimizing Ensembles of Decision Trees for Gene-Gene Interaction Detections
Author
Assareh, A. ; Volkert, L.G. ; Jing Li
Author_Institution
CS Dept., Kent State Univ., Kent, OH, USA
Volume
1
fYear
2012
fDate
12-15 Dec. 2012
Firstpage
616
Lastpage
621
Abstract
One of the main goals of genome wide association studies (GWAS) has been detecting gene-gene interactions, also known as epistasis in a broad sense, underlying complex diseases. However, high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions make most of existing interaction detection strategies practically inapplicable. Because they are capable of capturing interactions among input variables in addition to the nonlinear effects, decision trees and their ensembles have been recently shown to be resourceful strategies in detecting interactions in GWAS data. However, unlike other nodes, selection of root nodes for decision trees is merely based on marginal effects of candidate variables over the training data, which can greatly limit their epistasis detection performance, especially when disease genotypes have low marginal effects. In this study, we show that modifying the selection criterion of the root node of each new tree joining the ensemble, in a way that captures the interaction with the best variable ranked by the ensemble at the time, leads to a higher power in epistasis detection by decision tree ensembles. We demonstrate the efficacy of this idea using the three most popular decision tree ensemble algorithms: Bagging, Random Forest and Adaboost. Our simulation studies using five two-locus epistasis models with low marginal effects show a considerable enhancement of interaction detection power of all mentioned ensemble strategies after applying the proposed modification.
Keywords
bioinformatics; data analysis; decision trees; diseases; genetic engineering; genomics; learning (artificial intelligence); search problems; Adaboost; GWAS data; bagging; decision tree ensemble algorithm; disease genotype; epistasis detection performance; exponential complexity; five two-locus epistasis model; gene-gene interaction detection; genome wide association study; genotype data; interaction trees; nonlinear effect; random forest; root node selection; search space; training data; Bagging; Boosting; Decision trees; Diseases; Training; Vegetation; decision trees; ensemble learning; epistasis models; interaction detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location
Boca Raton, FL
Print_ISBN
978-1-4673-4651-1
Type
conf
DOI
10.1109/ICMLA.2012.114
Filename
6406635
Link To Document