DocumentCode
700149
Title
Minimum description length based protein secondary structure prediction
Author
Hategan, Andrea ; Tabus, Ioan
Author_Institution
Inst. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
fYear
2008
fDate
25-29 Aug. 2008
Firstpage
1
Lastpage
5
Abstract
This paper introduces a new algorithm for predicting the secondary structure of a protein based on the protein´s primary structure, i.e. its amino acid sequence. The problem consists in finding the segmentation of the initial amino acid sequence, where each segment carries the label of a secondary structure, e.g., helix, strand, and coil. Our algorithm is different from other existing probabilistic inference algorithms in that it uses probabilistic models suitable for directly encoding the joint information represented by the pair (amino acid sequence, secondary structure labels), and chooses as winner the secondary structure sequence providing the minimum representation, or description length, in line with the minimum description length principle. An additional benefit of our approach is that we provide not only a secondary structure prediction tool, but also a tool that is able to compress in an efficient manner the joint sequences that define the primary and secondary structure information in proteins. The preliminary results obtained for prediction and compression show a good performance, which is better in certain aspects than that of comparable algorithms.
Keywords
image representation; image segmentation; image sequences; proteins; amino acid sequence; minimum description length; minimum representation; probabilistic inference algorithms; protein primary structure; protein secondary structure prediction; secondary structure labels; secondary structure prediction tool; secondary structure sequence; sequence segmentation; Amino acids; Context; Encoding; Prediction algorithms; Proteins; Signal processing algorithms; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2008 16th European
Conference_Location
Lausanne
ISSN
2219-5491
Type
conf
Filename
7080681
Link To Document