• DocumentCode
    2452350
  • Title

    A New Method for Mapping Short DNA Sequencing Reads by Using Quality Scores

  • Author

    Ozer, Hatice Gulcin ; Camerlengo, Terry ; Huang, Tim ; Huang, Kun

  • Author_Institution
    Comprehensive Cancer Center Biomed. Inf. Shared Resources, Ohio State Univ., Columbus, OH, USA
  • fYear
    2009
  • fDate
    15-17 June 2009
  • Firstpage
    21
  • Lastpage
    25
  • Abstract
    New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, we investigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy. Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.
  • Keywords
    DNA; biology computing; genetics; ChIP-seq data; Eland algorithm; epigenetic study; genome; high-throughput sequencing technologies; mapping algorithms; quality scores; short DNA sequencing; Bioinformatics; Biomedical informatics; Cancer; Collaboration; DNA; Genomics; Pipelines; Sequences; Simple object access protocol; Solids; quality score; short DNA sequencing; short sequence mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, 2009. OCCBIO '09. Ohio Collaborative Conference on
  • Conference_Location
    Cleveland, OH
  • Print_ISBN
    978-0-7695-3685-9
  • Type

    conf

  • DOI
    10.1109/OCCBIO.2009.35
  • Filename
    5159155