مرکز منطقه ای اطلاع رساني علوم و فناوري - An Effective Data Mining Technique for Classifying Unaligned Protein Sequences into Functional Families

DocumentCode :

2776881

Title :

An Effective Data Mining Technique for Classifying Unaligned Protein Sequences into Functional Families

Author :

Ma, Patrick C H ; Chan, Keith C C

Author_Institution :

The Hong Kong Polytechnic University, China

fYear :

2006

fDate :

Sept. 2006

Firstpage :

202

Lastpage :

202

Abstract :

To classify proteins into functional families based on their primary sequences, existing classification algorithms such as the k-NN, HMM and SVM-based algorithms are often used. For most of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process is error-prone, protein classification may not be performed very accurately. In addition to the request for accurate alignment, many existing approaches require additional techniques to decompose a protein multi-class classification problem into a number of binary problems. This may slow the learning process when the number of classes being handled is large. For these reasons, we propose an effective data mining technique in this paper. This technique has been applied in real protein sequence classification tasks. Experimental results show that it can effectively classify unaligned protein sequences into corresponding functional families and the patterns it discovered during the training process have been found to be biologically meaningful. They can lead to better understanding of protein functions and can also allow functionally significant structural features of different protein families to be better characterized.

Keywords :

Bioinformatics; Classification algorithms; Data mining; Evolution (biology); Genetic mutations; Genomics; Hidden Markov models; Protein engineering; Protein sequence; Time factors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer and Information Technology, 2006. CIT '06. The Sixth IEEE International Conference on

Conference_Location :

Seoul

Print_ISBN :

0-7695-2687-X

Type :

conf

DOI :

10.1109/CIT.2006.41

Filename :

4019975

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2776881