Title :
The Homology Kernel: A Biologically Motivated Sequence Embedding into Euclidean Space
Author :
Eskin, Eleazar ; Snir, Sagi
Author_Institution :
Department of Computer Science Engineering University of California, San Diego, eeskin@cs.ucsd.edu
Abstract :
Part of the challenge of modeling protein sequences is their discrete nature. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this paper, we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondary structure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.
Keywords :
Biological system modeling; Computer science; Embedded computing; Hidden Markov models; Kernel; Mathematical model; Mathematics; Power engineering and energy; Prediction methods; Protein engineering;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on
Print_ISBN :
0-7803-9387-2
DOI :
10.1109/CIBCB.2005.1594915