• DocumentCode
    3582672
  • Title

    Efficient de Bruijn graph construction for genome assembly using a hash table and auxiliary vector data structures

  • Author

    Limon, Mahfuzer Rahman ; Sharker, Ratul ; Biswas, Sajib ; Rahman, M. Sohel

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Bangladesh Univ. of Eng. & Technol., Dhaka, Bangladesh
  • fYear
    2014
  • Firstpage
    121
  • Lastpage
    126
  • Abstract
    Modern next-generation sequencing technologies can generate huge volumes of data. One popular and useful tool to analyze these huge amount of data is the so called de Bruijn graph. Because of the huge number of nodes, in de Bruijn Graph based genome assembly the main barrier is the memory and runtime. And, this area has been the focus of significant attention in the contemporary literature. We present an algorithm that makes a balance between memory and runtime. Our approach stores the de Bruijn graph in a hash table with an auxiliary data structure which improves the total memory usage and runtime with no false positives. In the whole assembly process, generally the graph construction procedure takes the major share of the time. Our approach presents significant advancement in this aspect. All the data files (in FASTA format) along with the program code are available for downloaded at the following link: https://drive.google.com/folderview?id=0B3D-hZtRZ933V1dMOVBHUkNJM00&usp=sharing.
  • Keywords
    biology computing; data structures; genomics; graph theory; storage management; FASTA format; auxiliary vector data structures; data files; de Bruijn graph construction procedure; genome assembly; hash table; memory usage; next-generation sequencing technologies; program code; Assembly; Bioinformatics; Data structures; Genomics; Indexes; Runtime; Vectors; Computer Science; Hashtable; Vector; de Bruijn graph; genome assembly;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (ICCIT), 2014 17th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICCITechn.2014.7073147
  • Filename
    7073147