Title :
Indexing Graphs for Path Queries with Applications in Genome Research
Author :
SireÌn, Jouni ; Välimäki, Niko ; Mäkinen, Veli
Author_Institution :
Dept. of Comput. Sci., Univ. of Chile, Santiago, Chile
Abstract :
We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications of such extensions. We extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to support path queries as an extension to substring searching. We develop, apply, and tailor this technique to a) read alignment on an extended BWT index of a graph representing pan-genome, i.e., reference genome and known variants of it; and b) split-read alignment on an extended BWT index of a splicing graph. Other possible applications include probe/primer design, alignments to assembly graphs, and alignments to phylogenetic tree of partial-order graphs. We report several experiments on the feasibility and applicability of the approach. Especially on highly-polymorphic genome regions our pan-genome index is making a significant improvement in alignment accuracy.
Keywords :
DNA; genomics; molecular biophysics; Burrows-Wheeler transform; acyclic directed labeled graphs; canonical sequence representation; extended BWT index; generic approach; genome research; graph representing pan-genome; high-polymorphic genome regions; indexing graphs; pan-genome index; partial-order graphs; path queries; phylogenetic tree; probe-primer design; read alignment; splicing graph; split-read alignment; Arrays; Automata; Bioinformatics; Genomics; Indexes; Transforms; Vectors; Pan-genome indexing; extended Burrows-Wheeler transform; graph indexing; read alignment; variation calling;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2013.2297101