Author/Authors :
Friedrichs, Stefanie University Medical Centre - Georg-August University Gottingen - Gottingen, Germany , Manitz, Juliane Department of Statistics and Econometrics - Georg-August University Gottingen - Gottingen, Germany , Burger, Patricia University Medical Centre - Georg-August University Gottingen - Gottingen, Germany , Amos, Christopher I Department of Community and Family Medicine - Geisel School of Medicine - Dartmouth College - Lebanon, USA , Risch, Angela University of Salzburg - Salzburg, Austria , Chang-Claude, Jenny German Cancer Research Center (DKFZ) - Heidelberg, Germany , Wichmann, Heinz-Erich Ludwig-Maximilians University - Munich, Germany , Kneib, Thomas Department of Statistics and Econometrics - Georg-August University Gottingen - Gottingen, Germany , Bickeböller, Heike University Medical Centre - Georg-August University Gottingen - Gottingen, Germany , Hofner, Benjamin Department of Medical Informatics - Biometry and Epidemiology - Friedrich-Alexander-Universitat Erlangen-Nurnberg - Erlangen, Germany
Abstract :
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets,
such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach
by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple
pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners
in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its
prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic
effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer
dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data
on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense.
Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it
enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data
and towards the understanding of biological processes involved in disease susceptibility.