Title : 
Recognizing patterns in protein sequences using iteration-performing calculations in genetic programming
         
        
        
            Author_Institution : 
Dept. of Comput. Sci., Stanford Univ., CA, USA
         
        
        
        
        
            Abstract : 
Uses genetic programming with automatically defined functions (ADFs) for the dynamic creation of a pattern-recognizing computer program consisting of initially-unknown detectors, an initially-unknown iterative calculation incorporating the as-yet-undiscovered detectors, and an initially-unspecified final calculation incorporating the results of the as-yet-unspecified iteration. The program´s goal is to recognize a given protein segment as being a transmembrane domain or non-transmembrane area of the protein. Genetic programming with automatic function definition is given a training set of differently-sized mouse protein segments and their correct classification. Correlation is used as the fitness measure. Automatic function definition enables genetic programming to dynamically create subroutines (detectors). A restricted form of iteration is introduced to enable genetic programming to perform calculations on the values returned by the detectors. When cross-validated, the best genetically-evolved recognizer for transmembrane domains achieves an out-of-sample correlation of 0.968 and an out-of-sample error rate of 1.6%. This error rate is better than that recently reported for five other methods
         
        
            Keywords : 
biology computing; biomembranes; functions; genetic algorithms; iterative methods; pattern recognition; proteins; subroutines; automatic function definition; automatically defined functions; classification; correlation; detectors; dynamic creation; dynamic subroutine creation; fitness measure; genetic programming; initially-unknown detectors; initially-unknown iterative calculation; initially-unspecified final calculation; iteration-performing calculations; mouse; nontransmembrane area; pattern-recognizing computer program; protein segment recognition; protein sequences; training set; transmembrane domain; undiscovered detectors; unspecified iteration; Amino acids; Biomembranes; Chemicals; Computer science; Detectors; Error analysis; Genetic programming; Mice; Pattern recognition; Proteins;
         
        
        
        
            Conference_Titel : 
Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on
         
        
            Conference_Location : 
Orlando, FL
         
        
            Print_ISBN : 
0-7803-1899-4
         
        
        
            DOI : 
10.1109/ICEC.1994.350008