• DocumentCode
    191015
  • Title

    In search of perfect reads

  • Author

    Pal, Soumitra ; Aluru, Srinivas

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Bombay, Mumbai, India
  • fYear
    2014
  • fDate
    2-4 June 2014
  • Firstpage
    1
  • Lastpage
    2
  • Abstract
    Continued advances in next generation short-read sequencing technologies are increasing throughput and read lengths, while driving down the error rates, for example within 1% for Illumina HiSeq reads. Moreover, the errors are not uniformly distributed in all reads, and a large percentage of reads are indeed error-free. Ability to predict such perfect reads can have significant impact on run-time complexity of applications. In this paper, we present a simple and fast k-spectrum analysis based method to identify error-free reads. Our experiments show that if around 80% of the reads in a dataset are perfect, then our method retains almost 99.9% of them with more than 90% precision rate. Though filtering out reads identified as erroneous by our method reduces the coverage by about 7% on an average, coverage pattern across genome remains similar. The filtration process can be customized at several levels of stringency depending upon the downstream application need.
  • Keywords
    error analysis; filtration; genomics; error-free read identification; fast k-spectrum analysis based method; filtration process; genomics; next generation short-read sequencing technologies; Accuracy; Bioinformatics; Error correction; Genomics; Next generation networking; Prediction algorithms; Sequential analysis; Next generation sequencing; error correction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4799-5786-6
  • Type

    conf

  • DOI
    10.1109/ICCABS.2014.6863919
  • Filename
    6863919