DocumentCode :
2571075
Title :
Homology based Multi-instance Kernel combination for Gram-negative protein subcelluar localization
Author :
Mei, Suyu ; Fei, Wang
Author_Institution :
Shanghai Key Lab. of Intell. Inf. Process., Fudan Univ., Shanghai, China
fYear :
2010
fDate :
16-18 April 2010
Firstpage :
5
Lastpage :
9
Abstract :
Previous computational models generally exclude homology out of the training set to reduce potential predictive bias. This paper proposes a hierarchical kernel to incorporate homology for more accurate similarity definition between two protein sequences. Metaphorized as the scenario of multi-instance learning, a homologous sequence is viewed as one evolutionary instance of the target sequence and all the homologous sequences constitute one homology bag. The bottom-level kernel is defined as k-mer spectrum kernel to define the similarity between any two instances; the middle-level multi-instance kernel is defined as the sum of all the spectrum kernels, actually the similarity definition between two homology bags, called Homology-based Multi-instance Kernel (HoMIKernel). By varying k-mer size and compressing 20 amino acids, we can derive multiple HoMIKernels, which are further combined into the top-level kernel called HoMIKernel+ to capture more contextual information and cover size-varying motifs. We evaluate HoMIKernel+ on Gram-negative benchmark dataset. The experiments show that HoMIKernel+ achieves better predictive performance than the baseline models and the incorporation of homologous sequences does increase the predictive performance.
Keywords :
biological techniques; molecular biophysics; molecular configurations; proteins; amino acids; contextual information; gram-negative protein subcelluar localization; hierarchical kernel; homologous sequence; k-mer spectrum kernel; multiinstance learning; protein sequences; size-varying motifs; Amino acids; Computational intelligence; Computational modeling; Computer science; Feature extraction; Information processing; Kernel; Laboratories; Predictive models; Protein sequence; homology; kernel method; multi-instance kernel; protein subcelluar localization; spectrum kernel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedical Technology (ICBBT), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6775-4
Type :
conf
DOI :
10.1109/ICBBT.2010.5479023
Filename :
5479023
Link To Document :
بازگشت