Title :
Classifying proteins as extracellular using programmatic motifs and genetic programming
Author :
Koza, John R. ; Bennett, Forrest H., III ; Andre, David
Author_Institution :
Dept. of Comput. Sci., Stanford Univ., CA, USA
Abstract :
As newly sequenced proteins are deposited into the world´s ever growing archive of protein sequences, they are typically immediately tested by various computerized algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location-that is, where the protein resides in a living organism (extracellular, intracellular, etc.). A paper by J. Cedano et al. (1997) reported a human-created five way algorithm for cellular location created using statistical techniques with 76% accuracy. The article describes a two way classification algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is extracellular. Unlike the statistical calculation, the genetically evolved algorithm employs a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, memory, data structures, set creating operations, macro definitions, recursion, etc. The genetically evolved classification algorithm can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif. The genetically evolved program constitutes an instance of an evolutionary computation technique producing a solution to a problem that is competitive with that produced using human intelligence
Keywords :
biology computing; genetic algorithms; molecular biophysics; pattern classification; proteins; arithmetic functions; biological structure; cellular location; computerized algorithms; conditional operations; data structures; evolutionary computation technique; extracellular proteins; genetic programming; genetically evolved algorithm; genetically evolved classification algorithm; human created five way algorithm; human intelligence; living organism; macro definitions; newly sequenced proteins; programmatic motif; programmatic motifs; protein classification; protein motif; protein sequences; set creating operations; statistical techniques; subroutines; two way classification algorithm; Arithmetic; Biology computing; Classification algorithms; Data structures; Evolutionary computation; Extracellular; Genetic programming; Organisms; Proteins; Testing;
Conference_Titel :
Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
0-7803-4869-9
DOI :
10.1109/ICEC.1998.699503