DocumentCode :
1448439
Title :
BpMatch: An Efficient Algorithm for a Segmental Analysis of Genomic Sequences
Author :
Felicioli, Claudio ; Marangoni, Roberto
Author_Institution :
Noname Res., Pisa, Italy
Volume :
9
Issue :
4
fYear :
2012
Firstpage :
1120
Lastpage :
1127
Abstract :
Here, we propose BpMatch: an algorithm that, working on a suitably modified suffix-tree data structure, is able to compute, in a fast and efficient way, the coverage of a source sequence S on a target sequence T, by taking into account direct and reverse segments, eventually overlapped. Using BpMatch, the operator should define a priori, the minimum length l of a segment and the minimum number of occurrences minRep, so that only segments longer than l and having a number of occurrences greater than minRep are considered to be significant. BpMatch outputs the significant segments found and the computed segment-based distance. On the worst case, assuming the alphabet dimension d is a constant, the time required by BpMatch to calculate the coverage is O(l2n). On the average, by setting l≥ 2 logd(n), the time required to calculate the coverage is only O(n). BpMatch, thanks to the minRep parameter, can also be used to perform a self-covering: to cover a sequence using segments coming from itself, by avoiding the trivial solution of having a single segment coincident with the whole sequence. The result of the self-covering approach is a spectral representation of the repeats contained in the sequence. BpMatch is freely available on: www.sourceforge.net/projects/bpmatch/.
Keywords :
bioinformatics; data structures; genomics; molecular biophysics; molecular configurations; trees (mathematics); BpMatch; alphabet dimension; direct segments; genomic sequence segmental analysis algorithm; reverse segments; segment based distance; self covering; source sequence coverage; suffix tree data structure; target sequence; Algorithm design and analysis; Bioinformatics; Complexity theory; Evolution (biology); Genomics; Image segmentation; Segmental analysis; coverage index.; genomic sequences; inverted repeats; repeats; Algorithms; Animals; Base Sequence; Chromosomes, Human; Genome; Genomics; Humans; Internet; Mice; Models, Genetic; Molecular Sequence Data; Repetitive Sequences, Nucleic Acid; Sequence Analysis, DNA; Software;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2012.30
Filename :
6152085
Link To Document :
بازگشت