DocumentCode :
21709
Title :
ResSeq: Enhancing Short-Read Sequencing Alignment By Rescuing Error-Containing Reads
Author :
Weixing Feng ; Peichao Sang ; Deyuan Lian ; Yansheng Dong ; Fengfei Song ; Meng Li ; Bo He ; Fenglin Cao ; Yunlong Liu
Author_Institution :
Harbin Eng. Univ., Harbin, China
Volume :
12
Issue :
4
fYear :
2015
fDate :
July-Aug. 1 2015
Firstpage :
795
Lastpage :
798
Abstract :
Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.
Keywords :
Bayes methods; bioinformatics; genomics; macromolecules; measurement errors; molecular biophysics; molecular configurations; polymorphism; probability; Bayesian-based approach; ResSeq; alignment sensitivity; biological applications; effective biological interpretation; enhancing short-read sequencing alignment; expected genomic information; genomic studies; map sequencing; measurement errors; mismatched nucleotides; next-generation short-read sequencing; probability; rescuing error-containing reads; sequence context retrieval; sequencing errors; single nucleotide polymorphisms; source code; Bioinformatics; Educational institutions; Electronic mail; Genomics; Probability; Reliability; Sequential analysis; Alignment; Error Analysis; Sequencing; Short-Read; error analysis; sequencing; short-read;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2014.2366103
Filename :
6942207
Link To Document :
بازگشت