DocumentCode :
2737673
Title :
Poster: Identification and classification of internal repeats in proteins
Author :
Yang, Cing-Han ; Wang, Hsin-Wei ; Shih, Tsan-Huang ; Pai, Tun-Wen
Author_Institution :
Nat. Taiwan Ocean Univ., Keelung, Taiwan
fYear :
2011
fDate :
3-5 Feb. 2011
Firstpage :
266
Lastpage :
266
Abstract :
Internal repeats are widely found in proteins and considered to be important in protein evolution and function. Three major types of internal repeat including domain, solenoid, and fibrous repeats are. These repeats may involve in protein-protein interaction as well as binding to various ligands such as DNA and RNA. For example, the tetratrico peptide repeats (TPR) that are involved in cell-cycle regulation, transcriptional regulation, protein transport, and assisting protein folding, and the TATA-binding protein (TBP) is a transcription factor that binds specifically to a DNA sequence. To identify and classify various types of protein repeats with different lengths from a query protein sequence or structure, we have designed a comprehensive system which focuses on analyzing autocorrelation relationships of sequence contents and topology of secondary structures within a protein. A complete database containing verified fundamental repeat sequence peptides and structural units for homologous matching analysis is also constructed. The data flow diagram of the proposed identification system contains two major parts: Repeat Database and Internal Repeat Analyzer. The Repeat Database is constructed by evaluating proteins from SCOP and Pfam through an autocorrelation mechanism. The Internal Repeat Analyzer is designed as a three-level hierarchical analysis for detecting domain, solenoid, and fibrous repeat respectively. In addition, an iteratively refined multiple structure alignment tool has been developed for comparing and verifying those extracted internal repeat substructures. In this study, the collected database contains 162 domain families with repeat characteristics, 28 fundamental repeat structure units and 129 repeat subsequences retrieved from 1,961 superfamilies, and we have demonstrated the proposed system can efficiently identify repeat topologies of proteins.
Keywords :
biochemistry; bioinformatics; cellular biophysics; database management systems; molecular biophysics; proteins; proteomics; DNA sequence; Internal Repeat Analyzer; Pfam; Repeat Database; SCOP; TATA-binding protein; autocorrelation relationships; cell-cycle regulation; homologous matching analysis; internal repeats; ligand binding; protein folding; protein transport; protein-protein interaction; proteins; repeat sequence peptides; tetratrico peptide repeats; three-level hierarchical analysis; transcription factor; transcriptional regulation; Correlation; DNA; Databases; Proteins; RNA; Solenoids; Topology; autocorrelation; internal repeat; protein repeats; repeat unit;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-61284-851-8
Type :
conf
DOI :
10.1109/ICCABS.2011.5729920
Filename :
5729920
Link To Document :
بازگشت