مرکز منطقه ای اطلاع رساني علوم و فناوري - Amino acid encoding schemes for machine learning methods

DocumentCode :

2765241

Title :

Amino acid encoding schemes for machine learning methods

Author :

Zamani, Masood ; Kremer, Stefan C.

Author_Institution :

Sch. of Comput. Sci., Univ. of Guelph, Guelph, ON, Canada

fYear :

2011

fDate :

12-15 Nov. 2011

Firstpage :

327

Lastpage :

333

Abstract :

In this paper, we investigate the efficiency of a number of commonly used amino acid encodings by using artificial neural networks and substitution scoring matrices. An important step in many machine learning techniques applied in computational biology is encoding the symbolic data of protein sequences reasonably efficient in numeric vector representations. This encoding can be achieved by either considering the amino acid physicochemical properties or a generic numerical encoding. In order to be effective in the context of a machine learning system, an encoding must preserve information relative to the problem at hand, while diminishing superfluous data. To this end, it is important to measure how much an encoding scheme can conserve the underlying similarities and differences that exist among the amino acids. One way to evaluate the effectiveness of an amino acid encoding scheme is to compare it to the roles that amino acids are actually found to play in biological systems. A numerical representation of the similarities and differences between amino acids can be found in substitution matrices commonly used for sequence alignment, since these substitution matrices are based on measures of the interchangeability of amino acids in biological specimens. In this study, a new encoding scheme is also proposed based on the genetic codon coding occurs during protein synthesis. The experimental results indicate better performances compared to the other commonly used encodings.

Keywords :

biology computing; learning (artificial intelligence); neural nets; proteins; amino acid encoding scheme; artificial neural network; biological specimen; biological system; computational biology; generic numerical encoding; machine learning; numeric vector representation; physicochemical properties; protein sequence; protein synthesis; substitution scoring matrices; symbolic data; Amino acids; Approximation methods; Biological neural networks; Encoding; Matrices; Proteins; Training; amino acids; artificial neural networks; machine learning; substitution matrix;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on

Conference_Location :

Atlanta, GA

Print_ISBN :

978-1-4577-1612-6

Type :

conf

DOI :

10.1109/BIBMW.2011.6112394

Filename :

6112394

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2765241