Title :
Evaluating Protein Motif Significance Measures: A Case Study on Prosite Patterns
Author :
Ferreira, Pedro Gabriel ; Azevedo, Paulo J.
Author_Institution :
Dept. of Informatics, Minho Univ., Braga
fDate :
March 1 2007-April 5 2007
Abstract :
The existence of preserved subsequences in a set of related protein sequences suggests that they might play a structural and functional role in protein´s mechanisms. Due to its exploratory approach, the mining process tends to deliver a large number of motifs. Therefore it is critical to release methods that identify relevant significant motifs. Many measures of interest and significance have been proposed. However, since motifs have a wide range of applications, how to choose the appropriate significance measures is application dependent. Some measures show consistent results being highly correlated, while others show disagreements. In this paper we review existent measures and study their behavior in order to assist the selection of the most appropriate set of measures. An experimental evaluation of the measures for high quality patterns from the Prosite database is presented
Keywords :
biology computing; data mining; proteins; sequences; Prosite database; prosite patterns; protein motif significance measures; protein sequence mining; Computational intelligence; Data mining; Databases; Evolution (biology); Hidden Markov models; Informatics; Particle measurements; Pattern analysis; Protein sequence; Pulse width modulation;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368869