Title :
Motif extraction and protein classification
Author :
Kunik, Vered ; Solan, Zach ; Edelman, Shimon ; Ruppin, Eytan ; Horn, David
Author_Institution :
Sch. of Comput. Sci., Tel Aviv Univ., Israel
Abstract :
We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to a Smith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.
Keywords :
biochemistry; biology computing; enzymes; pattern classification; support vector machines; SVM classifier; SVMProt; Smith-Waterman distances matrix; amino acids; biological sequence data; enzymes; meaningful motifs extraction; motif extraction algorithm; motif-space; oxidoreductases class; physical-chemical properties; proteins; sequence-to-function classification; unsupervised method; Astronomy; Biochemistry; Computer science; Data mining; Electrons; Physics; Proteins; Psychology; Support vector machine classification; Support vector machines; enzyme classification; motif extraction;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Proceedings. 2005 IEEE
Print_ISBN :
0-7695-2344-7
DOI :
10.1109/CSB.2005.39