DocumentCode :
525623
Title :
Prediction of the protein O-glycosylation by machine learning and statistical characters around the glycosylation sites
Author :
Nishikawa, Ikuko ; Nakajima, Yukiko ; Sakakibara, Kazutoshi ; Ito, Masahiro
Author_Institution :
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear :
2010
fDate :
23-25 June 2010
Firstpage :
671
Lastpage :
674
Abstract :
O-glycosylation of the mammalian protein is investigated. It is serine or threonine specific, though any consensus sequence is still unknown. We have applied support vector machines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate the condition for glycosylation and elucidate the mechanisms. In the present study, first we focus on the distribution of the glycosylation sites. It is observed that many O-glycosylated sites are in clusters of closely spaced glycosylated sites, whereas the other sites are found sparsely or isolated. These two types of crowded and isolated sites might have different glycosylation mechanisms. Therefore, we divide the whole O-glycosylation sites into the crowded and the isolated groups. For each group, SVM is trained to predict the O-glycosylation sites separately. The prediction results of two SVMs have different input information dependency. The results indicate that some motifs are expected for the isolated group, while the interaction between the glycosylated sites and the relative proportion of the surrounding amino acids affect the glycosylation for the crowded group. Then, we compare the statistics of amino acid sequences around the glycosylation sites of both groups. As the results, some amino acids (proline, valine, alanine etc.) have high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated glycosylation. Moreover, independent component analysis for the amino acid sequences elucidates position specific existences of the above amino acids, including well known proline at -1 and +3, which are found as different independent components. Finally, we investigate the relation with O-glycosylation and the domain structure or the disordered region of the protein. It is clearly observed that O-glycosylation is more frequently observed in the disordered region and less in the domain. This could be the key feature to un- - derstand the non-conservation property, the role in functional diversity and structural stability of O-glycosylation.
Keywords :
bioinformatics; independent component analysis; learning (artificial intelligence); proteins; support vector machines; amino acid sequence elucidate position; bioinformatics; functional diversity; independent component analysis; machine learning; mammalian protein; protein O-glycosylation site prediction; statistical characters; structural stability; support vector machines; Amino acids; Bioinformatics; Educational institutions; Independent component analysis; Machine learning; Probability; Protein engineering; Protein sequence; Statistics; Support vector machines; Bioinformatics; Intrinsically disordered; Protein glycosylation; Support vector machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-7324-3
Electronic_ISBN :
978-89-88678-22-0
Type :
conf
Filename :
5542837
Link To Document :
بازگشت