Title :
A Comprehensive System for Identifying Internal Repeat Substructures of Proteins
Author :
Kao, Hua-Ying ; Shih, Tsang-Huang ; Pai, Tun-Wen ; Lu, Ming-Da ; Hsu, Hui-Huang
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Taiwan Ocean Univ., Keelung, Taiwan
Abstract :
Repetitive substructures within a protein play an important role in understanding protein folding and stability, biological function, and genome evolution. About 25% of all proteins contain repeat structures for eukaryote species and most of them do not have the resolved structural information yet. Therefore, this study aimed to design a comprehensive system for identifying internal repeats either from a protein sequence or structural information. In this study, we have curated a set of internal repeat units as a benchmark dataset for performing both sequence and structural alignment with respect to the query sequence or structure. Except for the traditional BLAST algorithms on amino acid sequence or the optimal structural superposition approaches on structures, a novel method employing the predicted secondary structure element information for internal repeat identification was proposed. Sequences were firstly transformed into Length Encoded Secondary Structure (LESS) profiles and followed by autocorrelation analyses. From the primary experimental results, the developed Internal Repeat Identification System (IRIS) can successfully identify internal repeats from those known protein structures, and the web system is freely available at http://iris.cs.ntou.edu.tw/.
Keywords :
biology computing; molecular biophysics; proteins; amino acid sequence; benchmark dataset; biological function; comprehensive system; eukaryote species; genome evolution; internal repeat identification system; internal repeat substructures; length encoded secondary structure profiles; optimal structural superposition; protein sequence; repetitive substructures; secondary structure element information; structural information; Amino acids; Autocorrelation; Bioinformatics; Biological information theory; Evolution (biology); Genomics; Iris; Protein engineering; Protein sequence; Stability; Length Encoded Secondary Structure; internal repeat unit; secondary structure element; sequence alignment; solenoid; structure alignment;
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on
Conference_Location :
Krakow
Print_ISBN :
978-1-4244-5917-9
DOI :
10.1109/CISIS.2010.92