• DocumentCode
    21709
  • Title

    ResSeq: Enhancing Short-Read Sequencing Alignment By Rescuing Error-Containing Reads

  • Author

    Weixing Feng ; Peichao Sang ; Deyuan Lian ; Yansheng Dong ; Fengfei Song ; Meng Li ; Bo He ; Fenglin Cao ; Yunlong Liu

  • Author_Institution
    Harbin Eng. Univ., Harbin, China
  • Volume
    12
  • Issue
    4
  • fYear
    2015
  • fDate
    July-Aug. 1 2015
  • Firstpage
    795
  • Lastpage
    798
  • Abstract
    Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.
  • Keywords
    Bayes methods; bioinformatics; genomics; macromolecules; measurement errors; molecular biophysics; molecular configurations; polymorphism; probability; Bayesian-based approach; ResSeq; alignment sensitivity; biological applications; effective biological interpretation; enhancing short-read sequencing alignment; expected genomic information; genomic studies; map sequencing; measurement errors; mismatched nucleotides; next-generation short-read sequencing; probability; rescuing error-containing reads; sequence context retrieval; sequencing errors; single nucleotide polymorphisms; source code; Bioinformatics; Educational institutions; Electronic mail; Genomics; Probability; Reliability; Sequential analysis; Alignment; Error Analysis; Sequencing; Short-Read; error analysis; sequencing; short-read;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2366103
  • Filename
    6942207