DocumentCode :
3739155
Title :
Predicting Clinical Phenotype Using OTU-Based Metagenome Representation
Author :
Nathan LaPierre;Huzefa Rangwala
Author_Institution :
Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
fYear :
2015
Firstpage :
156
Lastpage :
163
Abstract :
We demonstrate a computational method to predict the clinical phenotypes of a patient from raw metagenomic sequence read data. We compared two state of the art programs for annotating the sequence data, UCLUST and Kraken, and using their output for feature generation. We apply these programs to a set of over 1.3 million reads from 904 patients, some of whom have liver cirrhosis, encephalopathy due to liver cirrhosis, or neither disease. Once the reads have been processed by UCLUST or Kraken, we use Support Vector Machines to setup the clinical phenotype prediction problem. We find that too many false negatives are being predicted by the classifier. In order to address the issue, we scale features to improve the classification model and evaluate the end results on a held-out test set. We demonstrate our approach works quickly and accurately with an 85.64% success rate when we use the UCLUST representation. We also find that UCLUST generally performs better than Kraken, with the latter having an 80.66% success rate. We also test our classifier on several subsets of the data, with success rates ranging from 69.81% to 96.72%.
Keywords :
"Genomics","Bioinformatics","Support vector machines","Assembly","Taxonomy","Databases","Liver"
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
Electronic_ISBN :
2375-9259
Type :
conf
DOI :
10.1109/ICDMW.2015.155
Filename :
7395666
Link To Document :
بازگشت