DocumentCode
2198055
Title
A Database of Selected Marine Genomics for Retrieving Distantly Related Proteins
Author
Shih, Tsan-Huang ; Hsu, Yen-Chu ; Pai, Tun-Wen ; Tzou, Wen-Shyong ; Hu, Chin-Hua
Author_Institution
Center for Marine Biosci. & Biotechnol., Nat. Taiwan Ocean Univ., Keelung, Taiwan
fYear
2009
fDate
17-19 Oct. 2009
Firstpage
1
Lastpage
5
Abstract
With the advancement of biological techniques, researches in the fields of marine evolution, ecology, and aquaculture have an explosive increasing rate both in volume and diversity. More than tens of thousands of genomic sequences were available for important marine species. However, most of the structures and corresponding functions remain unresolved and unknown. To discover the biological characteristics of genomic sequences of a marine species, an efficient and effective method for detecting distantly related proteins based on experimentally known functions from model species becomes an important strategy. In this study, Ensembl and NCBI genetic databases were employed to build a primitive database of selected marine species. The system contained an abundance of useful DNA, RNA and Protein information, and was named as the Marine Species Genome Database (MSGD). To identify remote proteins, we have proposed a novel LESS (length encoded secondary structure) profile to improve the information retrieval applications, especially for identifying protein sequences without resolved structures and within low sequence identity. The matching algorithms applied several existing secondary structure prediction techniques and a feasible encoding mechanism with respect to the length distribution of secondary structures. Due to the conservation of secondary structures of proteins in evolution, the proposed system demonstrated its suitability for similarity comparison of distantly related proteins, and several important protein sequences can be retrieved by MSGD while those well-known residue-based matching methods missed the identification.
Keywords
biology computing; genomics; information retrieval; proteins; DNA; Ensembl genetic database; Marine Species Genome Database; NCBI genetic database; RNA; biological techniques; genomic sequences; length encoded secondary structure profile; protein information; selected marine genomics database; Aquaculture; Bioinformatics; Biological techniques; Databases; Environmental factors; Evolution (biology); Genomics; Information retrieval; Proteins; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Biomedical Engineering and Informatics, 2009. BMEI '09. 2nd International Conference on
Conference_Location
Tianjin
Print_ISBN
978-1-4244-4132-7
Electronic_ISBN
978-1-4244-4134-1
Type
conf
DOI
10.1109/BMEI.2009.5305675
Filename
5305675
Link To Document