• DocumentCode
    599152
  • Title

    A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition

  • Author

    Bonham-Carter, Oliver ; Ali, Hamza ; Bastola, Dhundy

  • Author_Institution
    Sch. of Interdiscipl. Inf., Univ. of Nebraska, Omaha, NE, USA
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    696
  • Lastpage
    703
  • Abstract
    Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.
  • Keywords
    DNA; genomics; molecular biophysics; molecular configurations; assembly preprocessing algorithm; bacterial restriction sites; contigs mixing; meta-genome sequencing; organismal groups; restriction site base composition; total spectrum sets; Assembly; Bioinformatics; DNA; Genomics; Heating; Microorganisms; base composition; palindromes; restriction sites; spectrum sets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2746-6
  • Electronic_ISBN
    978-1-4673-2744-2
  • Type

    conf

  • DOI
    10.1109/BIBMW.2012.6470222
  • Filename
    6470222