DocumentCode :
1196054
Title :
A Lazy Data Mining Approach for Protein Classification
Author :
Merschmann, Luiz ; Plastino, Alexandre
Author_Institution :
Dept. of Comput. Sci., Univ. Fed. Fluminense, Niteroi
Volume :
6
Issue :
1
fYear :
2007
fDate :
3/1/2007 12:00:00 AM
Firstpage :
36
Lastpage :
42
Abstract :
In this work, we propose a new computational technique to solve the protein classification problem. The goal is to predict the functional family of novel protein sequences based on their motif composition. In order to improve the results obtained with other known approaches, we propose a new data mining technique for protein classification based on Bayes´ theorem, called highest subset probability (HiSP). To evaluate our proposal, datasets extracted from Prosite, a curated protein family database, are used as experimental datasets. The computational results have shown that the proposed method outperforms other known methods for all tested datasets and looks very promising for problems with characteristics similar to the problem addressed here. In addition, our experiments suggest that HiSP performs well on highly imbalanced datasets
Keywords :
Bayes methods; biology computing; data mining; molecular biophysics; probability; proteins; Bayes theorem; Prosite; curated protein family database; highest subset probability; lazy data mining; motif composition; protein classification; protein sequences; Amino acids; Computer science; Data mining; Databases; Decision trees; Information resources; Learning automata; Learning systems; Protein sequence; Testing; Data mining; lazy learning; protein classification; Algorithms; Amino Acid Sequence; Database Management Systems; Databases, Protein; Information Storage and Retrieval; Molecular Sequence Data; Proteins; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2007.891910
Filename :
4118127
Link To Document :
بازگشت