DocumentCode :
2737707
Title :
Workshop: Inferring viral population from ultra-deep sequencing data
Author :
Astrovskaya, Irina
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
fYear :
2011
fDate :
3-5 Feb. 2011
Firstpage :
267
Lastpage :
267
Abstract :
Since existing high-throughput sequencing systems are originally designed for a single genome assembly, they cannot distinguish and simultaneously assemble multiple closely related sequences as well as estimate their relative abundances. This paper presents a novel approach in ViSpA software for quasispecies spectrum reconstruction. On simulated data, ViSpA accurately reconstructs up to 29 (out of 44) quasispecies in absence of genotyping errors. The ViSpA was also applied to real read data derived from blood sample of HCV-infected patient processed by Roche 454 Life Science machine. The sequenced region is half-genome long. The method reconstructed 10 most frequent sequences each of which represents a viable protein. The most frequent sequence has been within 1% from the actual ORF obtained by cloning the quasispecies. ShoRAH was able to reconstruct only one sequence that represents a viable protein. This sequence has 99.94% similarity with the fourth most frequent assemblies. Both methods returned similar frequency estimations for this sequence: 0.017% (ShoRAH) and 0.019% (ViSpA). The remaining top 9 quasispecies reconstructed by ShoRAH contain multiple stop codons in their corresponding amino-acid sequences which is an indication of unfixed systematic erroneous indels introduced by 454 Life Sciences machines. Additional experiments on 90% of read data shows that the ten most frequent assembled quasispecies are robustly reproduced by the sequencing process in ViSpA.
Keywords :
biology computing; genomics; inference mechanisms; microorganisms; molecular biophysics; molecular configurations; proteins; ShoRAH; ViSpA software; blood sample; codons; genome; genotyping errors; high-throughput sequencing systems; inference; protein; quasispecies spectrum; quasispecies spectrum reconstruction; ultradeep sequencing data; viral population; Assembly; Bioinformatics; Estimation; Frequency estimation; Genomics; Human immunodeficiency virus; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-61284-851-8
Type :
conf
DOI :
10.1109/ICCABS.2011.5729921
Filename :
5729921
Link To Document :
بازگشت