DocumentCode :
3714574
Title :
Cross-validation and cross-study validation of chronic lymphocytic leukemia with exome sequences and machine learning
Author :
Nihir Patel;Bharati Jhadav;Abdulrhman Aljouie;Usman Roshan
Author_Institution :
Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai Hospital, Hess Center for Science and Medicine, New York City, 10029, USA
fYear :
2015
Firstpage :
1367
Lastpage :
1374
Abstract :
The era of genomics brings the potential of better DNA based risk prediction and treatment. While genome-wide association studies are extensively studied for risk prediction, the potential of using whole exome data for this purpose is unclear. We explore this problem for chronic lymphocytic leukemia that is one of the largest whole exome dataset of 186 case and 169 controls available from the NIH dbGaP database. We perform a standard next generation sequence procedure to obtain SNP variants on 153 cases and 144 controls after exclusion of samples with missing data. To evaluate their predictive power we first conduct a 50% training and 50% test cross-validation study on the full dataset with the support vector machine as the classifier. There we obtain a mean accuracy of 82% with top 20 ranked SNPs obtained by the Pearson correlation coefficient. We then perform a cross-study validation on case and controls from a lymphoma external study and just controls from head and neck cancer and breast cancer studies (all obtained from NIH dbGaP). On the external dataset we obtain an accuracy of 70% with top ranked SNPs obtained from the original dataset. We also find our top Pearson ranked SNPs to lie on previously implicated genes for this disease. Our study shows that even with a small sample size we can obtain moderate to high accuracy with exome sequences and is thus encouraging for future work.
Keywords :
"Genomics","Bioinformatics","DNA","Correlation","Biological information theory"
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BIBM.2015.7359878
Filename :
7359878
Link To Document :
بازگشت