مرکز منطقه ای اطلاع رساني علوم و فناوري - A Lazy Data Mining Approach for Protein Classification

DocumentCode :

1196054

Title :

A Lazy Data Mining Approach for Protein Classification

Author :

Merschmann, Luiz ; Plastino, Alexandre

Author_Institution :

Dept. of Comput. Sci., Univ. Fed. Fluminense, Niteroi

Volume :

Issue :

fYear :

2007

fDate :

3/1/2007 12:00:00 AM

Firstpage :

Lastpage :

Abstract :

In this work, we propose a new computational technique to solve the protein classification problem. The goal is to predict the functional family of novel protein sequences based on their motif composition. In order to improve the results obtained with other known approaches, we propose a new data mining technique for protein classification based on Bayes´ theorem, called highest subset probability (HiSP). To evaluate our proposal, datasets extracted from Prosite, a curated protein family database, are used as experimental datasets. The computational results have shown that the proposed method outperforms other known methods for all tested datasets and looks very promising for problems with characteristics similar to the problem addressed here. In addition, our experiments suggest that HiSP performs well on highly imbalanced datasets

Keywords :

Bayes methods; biology computing; data mining; molecular biophysics; probability; proteins; Bayes theorem; Prosite; curated protein family database; highest subset probability; lazy data mining; motif composition; protein classification; protein sequences; Amino acids; Computer science; Data mining; Databases; Decision trees; Information resources; Learning automata; Learning systems; Protein sequence; Testing; Data mining; lazy learning; protein classification; Algorithms; Amino Acid Sequence; Database Management Systems; Databases, Protein; Information Storage and Retrieval; Molecular Sequence Data; Proteins; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid;

fLanguage :

English

Journal_Title :

NanoBioscience, IEEE Transactions on

Publisher :

ieee

ISSN :

1536-1241

Type :

jour

DOI :

10.1109/TNB.2007.891910

Filename :

4118127

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1196054