DocumentCode :
191051
Title :
New computational methods for assessing the genetic relatedness of close viral variants
Author :
Campo, David S. ; Dimitrova, Zoya ; Guo-Liang Xia ; Skums, Pavel ; Ganova-Raeva, Lilia ; Khudyakov, Yury
Author_Institution :
Centers for Disease Control & Prevention, Mol. Epidemiology & Bioinf. Lab., Atlanta, GA, USA
fYear :
2014
fDate :
2-4 June 2014
Firstpage :
1
Lastpage :
1
Abstract :
Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood products. HCV outbreaks are difficult to detect and investigate because HCV infections can remain asymptomatic in >70% of infected persons for years, even decades. During the 5-year period of 2008-2013 in the United States, 18 Hepatitis C outbreaks related to healthcare were reported to CDC. These outbreaks involved 223 associated cases and more than 90,550 at-risk persons were notified for screening. The prevailing method for molecular detection of viral transmissions involves: (i) sequencing a small heterogeneous region of the HCV genome isolated from each patient serum, (ii) creating a phylogenetic tree using the sequences, and (iii) identifying transmission clusters as subtrees containing sequences related above a certain threshold. This framework has also been used for other heterogeneous pathogens such as hepatitis B virus and human immunodeficiency virus. The present work identifies and solves two problems of molecular detection of transmissions during outbreak investigations using small genomic regions. The studied dataset included 1073 HCV genome sequences and 370 HBV genome sequences obtained from GenBank. The first problem is selection of a genomic region most applicable for outbreak detection. Commonly, the chosen region is simply most variable. Variability is usually defined by creating a whole genome multiple-sequence alignment and then calculating genetic heterogeneity using a sliding window of 300 to 500 nt (most common size of amplicons). The major problem with this approach is that it implicitly assumes that the region with the highest global genetic heterogeneity will also provide the highest discrimination among closely related variants. We study the discrimination for each region by calculating the `local genetic heterogen- ity´, defined as the nucleotide diversity of the region over the 10 nearest neighbors (defined using the whole genome), and averaged over all sequences. It was found that the local heterogeneity allows for discriminating between regions of a similar global heterogeneity. For instance, among several regions of high global heterogeneity, only the E1/E2 region had a high local genetic heterogeneity. For HBV, we found that the most discriminating region was the TP domain of the polymerase gene. This region is more suitable for detection of transmissions during outbreaks than the commonly used S gene. Phylogenetic trees obtained using small genomic regions and entire genomes are frequently distinctly different from each other, which generates a problem in application of small regions to the detection of clusters of closely related genetic viral variants consistent with transmission of a single viral strain among patients. To resolve this problem we calculated correlation between matrices of genetic distances among HCV amplicon-size genomic regions and whole-genome sequences. For these experiments, we used the E1/E2 genomic region, which is commonly applied for molecular detection of genetic relatedness among HCV strains. Considering the task as an optimization problem, we conducted a search for weights for each nucleotide position in a small region that yield the highest possible correlation between both matrices. We solved this optimization problem using a Genetic Algorithm, finding a marked improvement of the correlation between matrices from r = 0.785 to = 0.905 in genotype 1a (15.3% improvement) and from r = 0.450 to 0.705 in genotype 1b (56.7% improvement). These new computational methods provide a measurable improvement of the selection and analysis of short genomic regions for assessing genetic relatedness among genetically close variants and can be applied to other heterogeneous pathogens.
Keywords :
diseases; evolution (biological); genetic algorithms; genetics; genomics; health care; medical computing; microorganisms; molecular biophysics; patient diagnosis; E1/E2 genomic region; GenBank; Genetic Algorithm; HBV genome sequences; HCV amplicon-size genomic regions; HCV genome sequences; HCV infections; HCV strains; Hepatitis C outbreaks; S gene; TP domain; United States; blood products; closely related genetic viral variants; cluster detection; computational methods; discrimination; drug diversion; genetic distances; genetic relatedness; genotype 1a; genotype 1b; global genetic heterogeneity; global heterogeneity; healthcare; hepatitis B virus; hepatitis C virus infections; heterogeneous pathogens; human immunodeficiency virus; local genetic heterogeneity; local heterogeneity; matrices; molecular transmission detection; nucleotide diversity; nucleotide position; optimization problem; outbreak detection; patient serum; phylogenetic trees; polymerase gene; public health problem; screening; short genomic region analysis; short genomic region selection; single viral strain transmission; sliding window; small genomic regions; small heterogeneous region sequencing; subtrees; transmission cluster identification; unsafe injection practices; variability is; viral transmissions; whole genome multiple-sequence alignment; whole-genome sequences; Bioinformatics; Correlation; Genomics; Pathogens; Phylogeny; Strain; Genetic algorithms; Genetic relatedness; Viral Hepatits; Viral quasispecies;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
Type :
conf
DOI :
10.1109/ICCABS.2014.6863937
Filename :
6863937
Link To Document :
بازگشت