Title : 
Efficient Design of Bio-Basis Function to Predict Protein Functional Sites Using Kernel-Based Classifiers
         
        
            Author : 
Maji, Pradipta ; Das, Chandra
         
        
            Author_Institution : 
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
         
        
        
        
        
        
        
            Abstract : 
In order to apply the powerful kernel-based pattern recognition algorithms such as support vector machines to predict functional sites in proteins, amino acids need encoding prior to input. In this regard, a new string kernel function, termed as the modified bio-basis function, is proposed that maps a nonnumerical sequence space to a numerical feature space. The proposed string kernel function is developed based on the conventional bio-basis function and needs a bio-basis string as a support like conventional kernel function. The concept of zone of influence of a bio-basis string is introduced in the proposed kernel function to take into account the influence of each bio-basis string in nonnumerical sequence space. An efficient method is described to select a set of bio-basis strings for the proposed kernel function, integrating the Fisher ratio and a novel concept of degree of resemblance. The integration enables the method to select a reduced set of relevant and nonredundant bio-basis strings.
         
        
            Keywords : 
bioinformatics; molecular biophysics; molecular configurations; pattern classification; proteins; support vector machines; amino acids encoding; biobasis function design; biobasis string zone of influence; kernel based classifiers; kernel based pattern recognition algorithms; modified biobasis function; nonnumerical sequence space; numerical feature space; protein functional site prediction; string kernel function; support vector machines; Bioinformatics; Biological information theory; Pattern recognition; Sequences; Support vector machines; Bioinformatics; functional site prediction; pattern recognition; sequence analysis; support vector machines; Algorithms; Binding Sites; Computational Biology; Information Storage and Retrieval; Models, Molecular; Neural Networks (Computer); Pattern Recognition, Automated; Protein Binding; Sequence Analysis, Protein;
         
        
        
            Journal_Title : 
NanoBioscience, IEEE Transactions on
         
        
        
        
        
            DOI : 
10.1109/TNB.2010.2080684