• DocumentCode
    2198055
  • Title

    A Database of Selected Marine Genomics for Retrieving Distantly Related Proteins

  • Author

    Shih, Tsan-Huang ; Hsu, Yen-Chu ; Pai, Tun-Wen ; Tzou, Wen-Shyong ; Hu, Chin-Hua

  • Author_Institution
    Center for Marine Biosci. & Biotechnol., Nat. Taiwan Ocean Univ., Keelung, Taiwan
  • fYear
    2009
  • fDate
    17-19 Oct. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    With the advancement of biological techniques, researches in the fields of marine evolution, ecology, and aquaculture have an explosive increasing rate both in volume and diversity. More than tens of thousands of genomic sequences were available for important marine species. However, most of the structures and corresponding functions remain unresolved and unknown. To discover the biological characteristics of genomic sequences of a marine species, an efficient and effective method for detecting distantly related proteins based on experimentally known functions from model species becomes an important strategy. In this study, Ensembl and NCBI genetic databases were employed to build a primitive database of selected marine species. The system contained an abundance of useful DNA, RNA and Protein information, and was named as the Marine Species Genome Database (MSGD). To identify remote proteins, we have proposed a novel LESS (length encoded secondary structure) profile to improve the information retrieval applications, especially for identifying protein sequences without resolved structures and within low sequence identity. The matching algorithms applied several existing secondary structure prediction techniques and a feasible encoding mechanism with respect to the length distribution of secondary structures. Due to the conservation of secondary structures of proteins in evolution, the proposed system demonstrated its suitability for similarity comparison of distantly related proteins, and several important protein sequences can be retrieved by MSGD while those well-known residue-based matching methods missed the identification.
  • Keywords
    biology computing; genomics; information retrieval; proteins; DNA; Ensembl genetic database; Marine Species Genome Database; NCBI genetic database; RNA; biological techniques; genomic sequences; length encoded secondary structure profile; protein information; selected marine genomics database; Aquaculture; Bioinformatics; Biological techniques; Databases; Environmental factors; Evolution (biology); Genomics; Information retrieval; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering and Informatics, 2009. BMEI '09. 2nd International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4244-4132-7
  • Electronic_ISBN
    978-1-4244-4134-1
  • Type

    conf

  • DOI
    10.1109/BMEI.2009.5305675
  • Filename
    5305675