DocumentCode
3249769
Title
Classification and function estimation of protein by using data compression and genetic algorithms
Author
Chiba, Shinji ; Sugawara, Ken ; Watanabe, Toshinori
Author_Institution
Sendai Nat. Coll. of Technol., Japan
Volume
2
fYear
2001
fDate
2001
Firstpage
839
Abstract
Protein has a complicated spatial structure and has chemical and physical functions that originate from the structure. It is important to predict the structure and functions of the protein from a DNA sequence or an amino acid sequence from a viewpoint of biology, medical science, protein engineering and so on. But at present no method is available to predict them accurately from the sequence. Instead, there are some approaches to estimate the functions approximately based on a similarity retrieval of sequences. We propose a new method for similarity retrieval of amino acid sequence based on the concept of homology retrieval using data compression. Introduction of the compression by dictionary technique enables us to describe the text data as an n-dimensional vector using n dictionaries which are generated by compressing n typical texts, and it also enables us to classify them based on their similarity. To classify the data clearly, it is effective to use ideal character strings as dictionaries. In this paper, we introduce genetic algorithm for dictionary generation and classify the amino acid sequences. Effectiveness of our proposal is examined using real genome data
Keywords
biocomputing; data compression; genetic algorithms; proteins; DNA sequence; amino acid sequence; biology; data compression; function classification; function estimation; genetic algorithms; homology retrieval; ideal character strings; medical science; n-dimensional vector; protein; protein engineering; similarity retrieval; spatial structure; Amino acids; Biology; Chemicals; DNA; Data compression; Dictionaries; Genetic algorithms; Information retrieval; Protein engineering; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation, 2001. Proceedings of the 2001 Congress on
Conference_Location
Seoul
Print_ISBN
0-7803-6657-3
Type
conf
DOI
10.1109/CEC.2001.934277
Filename
934277
Link To Document