Title :
Identifier and database from the same sequence repository provide the greatest number of correct pairings between RNA and protein data
Author :
Lee, Shang-Jung ; Kearney, Robert
Author_Institution :
Dept. of Biomed. Eng., McGill Univ., Montreal, QC, Canada
Abstract :
There is increasing interest in the integrated analysis of RNA and protein datasets to provide a more holistic description of a biological system. To do so, gene and protein identifiers can be paired by using databases that cross-link identifiers from multiple public databases. However, little is known about how using different identifiers and cross-linking databases influence the accuracy and completeness of the resulting gene-protein pairs. We investigated this by matching an existing dataset of rat proteins to their corresponding genes using different combinations of identifiers and databases from the NCBI and PIR repositories. We found that using identifier and cross-linking database from the same sequence repository yielded the most correct gene-protein pairs and consequently maximized the amount of proteins that could be compared with their gene counterparts. Overall using the GeneInfo identifier and the Entrez database, we matched 84% of the 4,016 proteins in our dataset to their corresponding genes.
Keywords :
bioinformatics; genetics; macromolecules; molecular biophysics; proteins; PIR repositories; RNA integrated analysis; cross-link identifier; gene identifier; gene-protein pair; holistic description; multiple public database; protein dataset; sequence repository; Biological systems; Biomedical engineering; Databases; Joining processes; Peptides; Protein engineering; Protein sequence; Proteomics; RNA; Testing;
Conference_Titel :
Computer-Based Medical Systems, 2009. CBMS 2009. 22nd IEEE International Symposium on
Conference_Location :
Albuquerque, NM
Print_ISBN :
978-1-4244-4879-1
Electronic_ISBN :
1063-7125
DOI :
10.1109/CBMS.2009.5255446