Title :
Protein secondary structure prediction using support vector machines and a codon encoding scheme
Author :
Zamani, Mahdi ; Kremer, S.C.
Author_Institution :
Sch. of Comput. Sci., Univ. of Guelph, Guelph, ON, Canada
Abstract :
In this study, we evaluate the performance of a protein secondary structure prediction model using a new amino acid "codon" encoding inspired by genetic codon mappings. The dimensionality of the binary codon encoding is less than that of an orthogonal encoding which requires less computations. Protein secondary structure prediction is an important step for machine learning techniques ultimately applied for protein 3D structure prediction. In the proposed model, one-stage binary support vector machines are employed, and the efficiency of the codon encoding to that of a commonly used orthogonal encoding are compared without incorporating protein evolutionary and structural information for an unbiased comparison. The performance of the classification model is measured according to Q3 and segment overlap (SOV) scores. The scores are compared with those of the prediction methods using an orthogonal encoding and protein sequence profiles. The experimental results indicate higher prediction accuracy based on Q3 SOV scores when sequence profiles are not used. Also, the relative classification scores of the proposed method are comparable with the methods incorporating protein global and evolutionary information. The experimental result implies the encoding scheme is able to integrate the evolutionary information into the prediction model since the encoding is based on genetic codon mappings which are the building blocks of amino acid formations at the primary level of biological processes. The codon encoding is worthwhile to be investigated using more complex learning architectures with the profiles and structural properties of proteins.
Keywords :
biological techniques; biology computing; genetics; molecular configurations; proteins; proteomics; support vector machines; Q3 SOV scores; Q3 SOV scores; amino acid formations; binary codon encoding; biological processes; classification model performance; codon encoding scheme; genetic codon mappings; machine learning techniques; orthogonal encoding; protein 3D structure prediction; protein evolutionary information; protein global information; protein secondary structure prediction model; protein sequence profiles; protein structural information; segment overlap scores; support vector machines; Accuracy; Amino acids; Encoding; Prediction methods; Protein sequence; Support vector machines; amino acids; machine learning; protein secondary structure; support vector machines;
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2746-6
Electronic_ISBN :
978-1-4673-2744-2
DOI :
10.1109/BIBMW.2012.6470326