Title :
An exhaustive genome assembly algorithm using k-mers to indirectly perform N-squared comparisons in O(N)
Author :
Shah, Maulik K. ; Lee, HoJoon ; Rogers, Stephanie A. ; Touchman, Jeffrey W.
Author_Institution :
Arizona State Univ., Tempe, AZ, USA
Abstract :
We present an algorithm that indirectly makes N2 sequence comparisons in O(N) with respect to the size of the genome. This algorithm is very applicable in assembling whole genomes from the thousands of DNA sequence fragments that are generated in shotgun sequencing. First, we assume that fragments that share k-mers should overlap in the final assembly. We then catalog all k-mers that exist in the shotgun library and infer links between fragments that share k-mers. These links are then used to represent edges in a graph. This graph is generated in O(N) yet represents the result of comparing every fragment to every other fragment.
Keywords :
DNA; biology computing; genetics; graphs; molecular biophysics; DNA sequence; N-squared comparisons; O(N); exhaustive genome assembly algorithm; graph; k-mers; shotgun sequencing; Assembly; Bioinformatics; Capacitive sensors; DNA; Data structures; Genomics; Libraries; Sequences; Windows;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
DOI :
10.1109/CSB.2004.1332565