INDARE - An indexed DAG of regular expressions for selecting position frequency matrices

Author

Park, Meeyoung ; Sanghvi, Jubin ; Dinakarpandian, Deendayal

Author_Institution

Univ. of Missouri-Kansas City, Kansas City

fYear

2007

fDate

2-4 Nov. 2007

Firstpage

191

Lastpage

196

Abstract

The identification of putative motifs in biomolecular sequences or whole genomes/proteomes is frequently based on window-based scanning with position frequency matrices (PFMs). The exponential increase in the amount of sequence data and the growing number of patterns to be screened has resulted in the need for rapid screening methods. In recognition of this, we have developed the Indexed DAG of regular expressions extractor (INDARE), a tool that dynamically extracts regular expressions (REs) for each PFM in the database, and creates a directed acyclic graph of REs. The INDARE generated DAG is very effective in pruning the search space and easily outperforms the naive exhaustive sequential search approach. The method is general enough to be applicable for the identification of motifs in any domain.

Keywords

biology computing; molecular biophysics; INDARE tool; Indexed DAG of Regular Expressions Extractor sequential search approach; biomolecular sequences; genomes; position frequency matrices; proteomes; Bioinformatics; Cities and towns; Computer science; Data mining; Databases; Frequency; Genomics; Informatics; Inverse problems; Pattern recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Bioinformatics and Biomedicine Workshops, 2007. BIBMW 2007. IEEE International Conference on

Conference_Location

Fremont, CA

Print_ISBN

978-1-4244-1604-2

Type

conf

DOI

10.1109/BIBMW.2007.4425418

Filename

4425418