• DocumentCode
    3249769
  • Title

    Classification and function estimation of protein by using data compression and genetic algorithms

  • Author

    Chiba, Shinji ; Sugawara, Ken ; Watanabe, Toshinori

  • Author_Institution
    Sendai Nat. Coll. of Technol., Japan
  • Volume
    2
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    839
  • Abstract
    Protein has a complicated spatial structure and has chemical and physical functions that originate from the structure. It is important to predict the structure and functions of the protein from a DNA sequence or an amino acid sequence from a viewpoint of biology, medical science, protein engineering and so on. But at present no method is available to predict them accurately from the sequence. Instead, there are some approaches to estimate the functions approximately based on a similarity retrieval of sequences. We propose a new method for similarity retrieval of amino acid sequence based on the concept of homology retrieval using data compression. Introduction of the compression by dictionary technique enables us to describe the text data as an n-dimensional vector using n dictionaries which are generated by compressing n typical texts, and it also enables us to classify them based on their similarity. To classify the data clearly, it is effective to use ideal character strings as dictionaries. In this paper, we introduce genetic algorithm for dictionary generation and classify the amino acid sequences. Effectiveness of our proposal is examined using real genome data
  • Keywords
    biocomputing; data compression; genetic algorithms; proteins; DNA sequence; amino acid sequence; biology; data compression; function classification; function estimation; genetic algorithms; homology retrieval; ideal character strings; medical science; n-dimensional vector; protein; protein engineering; similarity retrieval; spatial structure; Amino acids; Biology; Chemicals; DNA; Data compression; Dictionaries; Genetic algorithms; Information retrieval; Protein engineering; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2001. Proceedings of the 2001 Congress on
  • Conference_Location
    Seoul
  • Print_ISBN
    0-7803-6657-3
  • Type

    conf

  • DOI
    10.1109/CEC.2001.934277
  • Filename
    934277