DocumentCode :
2528444
Title :
Mining protein sequence motifs representing common 3D structures
Author :
Zhong, Wei ; Altum, Gulsah ; Harrison, Robert ; Tai, Phang C. ; Pan, Yi
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
fYear :
2005
fDate :
8-11 Aug. 2005
Firstpage :
215
Lastpage :
216
Abstract :
Understanding the relationship between protein structure and its sequence is one of the most important tasks of current bioinformatics research. In this work, recurring protein sequence motifs are explored with a K-means clustering algorithm. No structural information is used during the clustering process so that the relationship between sequence similarity and structural similarity for sequence-based clusters can be studied. This work focuses on characterizing structural similarity so that the quality of sequence clusters can be assessed accurately. Analysis of results reveals that the combined metric of distance matrix root mean squared deviation for sequence cluster (dmRMSD_SC) and torsion angle RMSD_SC (taRMSD_SC) can provide the reliable indication of structural similarity for sequence clusters. Based on our combined metric, the recurrent sequence clusters with high structural similarity are used to generate sequence motifs. The common 3D structure of a sequence motif is represented by both representative backbone torsion angles and average distance matrices of the sequence cluster used to produce this motif. These motifs provide the foundation to develop a protein vocabulary reflecting sequence-structure correspondence.
Keywords :
biochemistry; biology computing; data mining; genetics; mean square error methods; molecular biophysics; proteins; statistical analysis; K-means clustering algorithm; average distance matrix; backbone torsion angle; bioinformatics research; clustering process; distance matrix root mean squared deviation; mining protein sequence motifs; protein structure; protein vocabulary reflecting sequence-structure; sequence-based clusters; structural similarity; torsion angle; Amino acids; Bioinformatics; Biology; Clustering algorithms; Computer science; Protein engineering; Protein sequence; Sequences; Spine; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
Print_ISBN :
0-7695-2442-7
Type :
conf
DOI :
10.1109/CSBW.2005.93
Filename :
1540604
Link To Document :
بازگشت