DocumentCode :
419553
Title :
Feature extraction for improved profile HMM based biological sequence analysis
Author :
Plötz, Thomas ; Fink, Gernot A.
Author_Institution :
Fac. of Technol., Bielefeld Univ., Germany
Volume :
2
fYear :
2004
fDate :
23-26 Aug. 2004
Firstpage :
315
Abstract :
State-of-the-art systems for biological sequence analysis employ statistical modeling techniques, most notably so-called profile HMMs. However, all approaches still rely on a purely symbolic sequence representation, which severely limits their capabilities in describing weak similarities between remotely homologue members of sequence families. Therefore, we propose a multi-channel signal-like sequence representation based on a combination of several numerically encoded biochemical properties of the individual residues. From this representation features are extracted capturing relevant local sequence properties by applying wavelet and principal component analysis. Evaluation results on a challenging task of sequence family classification prove that profile HMMs trained on the feature-based sequence representation significantly outperform discrete models.
Keywords :
biology computing; feature extraction; hidden Markov models; principal component analysis; proteins; sequences; wavelet transforms; biological sequence analysis; feature extraction; hidden Markov model; multichannel signal sequence representation; principal component analysis; symbolic sequence representation; wavelet analysis; Amino acids; Biological information theory; Biological system modeling; Data mining; Discrete wavelet transforms; Feature extraction; Hidden Markov models; Principal component analysis; Proteins; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-2128-2
Type :
conf
DOI :
10.1109/ICPR.2004.1334187
Filename :
1334187
Link To Document :
بازگشت