DocumentCode :
3363061
Title :
Reducing the Space of Degenerate Patterns in Protein Remote Homology Detection
Author :
Comin, Matteo ; Verzotto, Davide
Author_Institution :
Dept. of Inf. Eng., Univ. of Padova, Padua, Italy
fYear :
2013
fDate :
26-30 Aug. 2013
Firstpage :
76
Lastpage :
80
Abstract :
In biology the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g. [FY]DPC[LIM][ASG]C[ASG], are in general represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcome often far exceeds the size of the original input, making the output hard to be managed and then interpreted by refined analysis requiring manual inspection. In this article we discuss a characterization of degenerate patterns with character classes, and introduce the concept of pattern priority, for comparing and ranking different patterns without gaps, together with the class of underlying patterns, which permits to filter any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein sequences with remote homologies. Results show that our approach drastically reduces the number of patterns in output from a tool for protein sequence analysis, while retaining the functional ones.
Keywords :
bioinformatics; data mining; database management systems; genomics; pattern classification; proteins; PROSITE database; character classes; degenerate pattern characterization; degenerate pattern discovery; degenerate pattern space reduction; genomes; pattern priority; pattern ranking; protein active site patterns; protein remote homology detection; protein sequence analysis; Bioinformatics; Complexity theory; Databases; Genomics; Nickel; Proteins;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2013 24th International Workshop on
Conference_Location :
Los Alamitos, CA
ISSN :
1529-4188
Print_ISBN :
978-0-7695-5070-1
Type :
conf
DOI :
10.1109/DEXA.2013.36
Filename :
6621349
Link To Document :
بازگشت