DocumentCode :
3073978
Title :
Data Matching for Physical Integration of Biochemical Pathway Databases
Author :
Tsay, Jyh-Jong ; Wu, Bo-Liang ; Chen, Chien-Wen
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chung-Cheng Univ., Chiayi, Taiwan
fYear :
2009
fDate :
22-24 June 2009
Firstpage :
216
Lastpage :
220
Abstract :
As databases can overlap each other, data matching that aims to identify data records or elements describing the same object is one of the fundamental problems in physical integration of databases. Matching results can be applied to induce more accurate and complete object descriptions, remove data redundancy, check data consistency and generate cross-links. In this paper, we present a multilevel approach to match data between pathway databases that consist of 3 levels of information, compounds, reactions and pathways. Compounds comprise reactions, and reactions comprise pathways. Our main idea is to use relationships discovered in lower levels to infer relationships in upper levels, and use relationships in upper levels to enhance relationships in lower levels. In particular, we first use attributes of compounds to identify compound matching, i.e. to map compounds in one database to those in another database. Compound matching is then used to induce reaction matching. Reaction matching is used to induce pathway matching as well as to enhance compound matching. We experiment our approach for integration of two well known pathway databases KEGG and MetaCyc. The experiment shows that our approach identifies 1025 matchings in compound level, 968 matchings in reaction level, and 387 relations in pathway level. According to matching result, we assign EC number to 84 reactions in MetaCyc that do not have EC number, and discover some duplicate errors in both databases. Furthermore, our approach identifies 532 of the 544 unification links provided by MetaCyc. The recall rate is 0.977.
Keywords :
biochemistry; biology computing; database management systems; KEGG pathway database; MetaCyc pathway database; biochemical pathway databases; compound matching; data matching; map compounds; multilevel approach; reaction matching; Bioinformatics; Biology computing; Computer errors; Computer science; Data engineering; Databases; Information analysis; Information retrieval; Organisms; Redundancy; Biochemical Pathway; Data Matching; Integration; KEGG; MetaCyc;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and BioEngineering, 2009. BIBE '09. Ninth IEEE International Conference on
Conference_Location :
Taichung
Print_ISBN :
978-0-7695-3656-9
Type :
conf
DOI :
10.1109/BIBE.2009.48
Filename :
5211283
Link To Document :
بازگشت