Title :
Feature extraction for improved profile HMM based biological sequence analysis
Author :
Plötz, Thomas ; Fink, Gernot A.
Author_Institution :
Fac. of Technol., Bielefeld Univ., Germany
Abstract :
State-of-the-art systems for biological sequence analysis employ statistical modeling techniques, most notably so-called profile HMMs. However, all approaches still rely on a purely symbolic sequence representation, which severely limits their capabilities in describing weak similarities between remotely homologue members of sequence families. Therefore, we propose a multi-channel signal-like sequence representation based on a combination of several numerically encoded biochemical properties of the individual residues. From this representation features are extracted capturing relevant local sequence properties by applying wavelet and principal component analysis. Evaluation results on a challenging task of sequence family classification prove that profile HMMs trained on the feature-based sequence representation significantly outperform discrete models.
Keywords :
biology computing; feature extraction; hidden Markov models; principal component analysis; proteins; sequences; wavelet transforms; biological sequence analysis; feature extraction; hidden Markov model; multichannel signal sequence representation; principal component analysis; symbolic sequence representation; wavelet analysis; Amino acids; Biological information theory; Biological system modeling; Data mining; Discrete wavelet transforms; Feature extraction; Hidden Markov models; Principal component analysis; Proteins; Sequences;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334187