مرکز منطقه ای اطلاع رساني علوم و فناوري - Weighted amino acid composition based on amino acid indices for prediction of protein structural classes

DocumentCode :

3228063

Title :

Weighted amino acid composition based on amino acid indices for prediction of protein structural classes

Author :

Nanuwa, Sundeep Singh ; Dziurla, André ; Seker, Huseyin

Author_Institution :

Dept. of Inf., De Montfort Univ., Leicester, UK

fYear :

2009

fDate :

4-7 Nov. 2009

Firstpage :

Lastpage :

Abstract :

Prediction of protein structural classes is one of the most important and challenging tasks in the bioinformatics field. A protein is classified into one of the four main types of protein structural classes; all-Â¿, all-ÃŸ, Â¿/ÃŸ and Â¿+ÃŸ. This paper investigates the role of amino acid indices (AAI) combined with traditional amino acid composition (AAC) to create a weighted amino acid composition (WAAC) feature-set to predict the structural class of a protein. There are over 500 amino acid indices that can be used to develop the novel weighted amino acid composition feature-set which has a great potential of increasing accuracy for the prediction of protein structural classes. For evaluation of these indices a high quality 40% homology dataset is used that contains over 7000 protein sequences (the largest of its kind) extracted from proteomic databases. The predictive technique developed is an optimum k-nearest-neighbour classifier, named multiple-k-nearest-neighbour (MKNN). In order to evaluate the classifier a 10- fold cross-validation test procedure is used throughout the study. Over 1 million analyses were carried out, the highest accuracy obtained was from index LEVM780101 at 48.35%, which is 9% higher than traditional AAC and 6.6% higher than that of the best sequence-driven-feature sub-set used in other studies. There is great potential for further improvement as WAAC is a feature-set with the least number of attributes without any feature selection and the numbers of indices that yielded higher accuracies than traditional AAC and other sequence-driven-features are 536 and 435, respectively, out of the 548 amino acid indices analysed in this study.

Keywords :

bioinformatics; feature extraction; molecular biophysics; molecular configurations; pattern classification; proteins; proteomics; LEVM780101; amino acid indices; bioinformatics; cross-validation test procedure; feature selection; homology dataset; multiple-k-nearest-neighbour; optimum k-nearest-neighbour classifier; protein sequences; protein structural class prediction; proteomic databases; weighted amino acid composition; Accuracy; Amino acids; Bioinformatics; Drugs; Informatics; Information technology; Protein engineering; Proteomics; Spatial databases; Testing; ASTRAL; Amino acid scales; LEVM780101; multiple k-nearest-neighbour; pseudo amino acid composition; weighted amino acid composition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Technology and Applications in Biomedicine, 2009. ITAB 2009. 9th International Conference on

Conference_Location :

Larnaca

Print_ISBN :

978-1-4244-5379-5

Electronic_ISBN :

978-1-4244-5379-5

Type :

conf

DOI :

10.1109/ITAB.2009.5394398

Filename :

5394398

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3228063