Author_Institution :
Coll. of Math. & Inf. Sci., Xianyang Normal Univ., Xianyang, China
Abstract :
O-glycosylation is one of the main types of the mammalian protein glycosylation, it occurs on the particular site of serine and threonine. In this paper, a new method of PCA-LDA is used for the prediction of O-glycosylation site under all kinds of window size (5,7,9,11,21,31,41,51). The new method of PCA-LDA is the combination of PCA and LDA, we also call it hybrid discriminate analysis (HDA). The test protein sequence which is encoded by the sparse coding is projected to the one-dimensional subspace and then by calculating the Mahanalobis distance between the projection and each class center, the test protein sequence is assigned into the "nearest" class, so it can be known that whether a particular site of serine and threonine is glycosylated. The result of experiments shows that the proposed method of HDA is more effective and accurate. The prediction accuracy is about 75%-92.5%.
Keywords :
molecular configurations; principal component analysis; proteins; proteomics; HDA; Mahanalobis distance; O-linked glycosylation site prediction; PCA-LDA; hybrid discriminate analysis; mammalian protein glycosylation; protein sequence; serine site; threonine site; Accuracy; Amino acids; Educational institutions; Hybrid intelligent systems; Information science; Mathematics; Principal component analysis; Protein sequence; Support vector machines; Testing; HDA; classification; glycosylation; prediction; protein; sparse coding;