Title :
Minimum description length based protein secondary structure prediction
Author :
Hategan, Andrea ; Tabus, Ioan
Author_Institution :
Inst. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
Abstract :
This paper introduces a new algorithm for predicting the secondary structure of a protein based on the protein´s primary structure, i.e. its amino acid sequence. The problem consists in finding the segmentation of the initial amino acid sequence, where each segment carries the label of a secondary structure, e.g., helix, strand, and coil. Our algorithm is different from other existing probabilistic inference algorithms in that it uses probabilistic models suitable for directly encoding the joint information represented by the pair (amino acid sequence, secondary structure labels), and chooses as winner the secondary structure sequence providing the minimum representation, or description length, in line with the minimum description length principle. An additional benefit of our approach is that we provide not only a secondary structure prediction tool, but also a tool that is able to compress in an efficient manner the joint sequences that define the primary and secondary structure information in proteins. The preliminary results obtained for prediction and compression show a good performance, which is better in certain aspects than that of comparable algorithms.
Keywords :
image representation; image segmentation; image sequences; proteins; amino acid sequence; minimum description length; minimum representation; probabilistic inference algorithms; protein primary structure; protein secondary structure prediction; secondary structure labels; secondary structure prediction tool; secondary structure sequence; sequence segmentation; Amino acids; Context; Encoding; Prediction algorithms; Proteins; Signal processing algorithms; Training;
Conference_Titel :
Signal Processing Conference, 2008 16th European
Conference_Location :
Lausanne