Title of article :
Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information
Author/Authors :
Yu, Xinghao Department of Epidemiology and Biostatistics - School of Public Health - Xuzhou Medical University - Xuzhou - Jiangsu, China , Xiao, Lishun Department of Epidemiology and Biostatistics - School of Public Health - Xuzhou Medical University - Xuzhou - Jiangsu, China , Zeng, Ping Department of Epidemiology and Biostatistics - School of Public Health - Xuzhou Medical University - Xuzhou - Jiangsu, China , Huang, Shuiping Department of Epidemiology and Biostatistics - School of Public Health - Xuzhou Medical University - Xuzhou - Jiangsu, China
Abstract :
In the past few years many prediction approaches have been proposed and widely employed in high dimensional
genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group
structures that naturally exists in genetic data. Methods. In the present study, we applied a novel model-averaging approach, called
jackknife model averaging prediction (JMAP), for high dimensional genetic risk prediction while incorporating pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross
validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow
model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the
performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to four
real cancer datasets that are publicly available from TCGA. Results. The simulations showed that compared with other existing
approaches (e.g., gsslasso), JMAP performed best or is among the best methods across a range of scenarios. For example, among 14
out of 16 simulation settings with PVE = 0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso.
We further found that in the simulation, the model weights for the true candidate models have much smaller chances to be zero
compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP
also behaves comparably or better compared with the other methods for continuous phenotypes. For example, for the COAD,
CRC, and PAAD datasets, the average gains of predictive accuracy of JMAP are 0.019, 0.064, and 0.052 compared with gsslasso.
Conclusion. The proposed method JMAP is a novel model-averaging approach for high dimensional genetic risk prediction while
incorporating external useful group structures into the model specification.
Keywords :
Complex , Phenotypes , Gene , Pathway , JMAP
Journal title :
Computational and Mathematical Methods in Medicine