DocumentCode :
1308588
Title :
DataFoundry: information management for scientific data
Author :
Critchlow, Terence ; Fidelis, Krzysztof ; Ganesh, Madhavan ; Musick, Ron ; Slezak, Tom
Author_Institution :
Center for Appl. Sci. Comput., Lawrence Livermore Nat. Lab., CA, USA
Volume :
4
Issue :
1
fYear :
2000
fDate :
3/1/2000 12:00:00 AM
Firstpage :
52
Lastpage :
57
Abstract :
Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.
Keywords :
data structures; data warehouses; meta data; scientific information systems; DataFoundry; DataFoundry project; business sources; commercial business applications; data marts; data warehouses; distributed heterogeneous sources; dynamic environment; highly dynamic data representations; information dissemination; information management; mediator; meta-data infrastructure; novel architecture; schema integration; scientific data; scientific data sources; scientific databases; scientific environments; traditional warehousing techniques; unique integration strategy; working warehouses; Associate members; Bioinformatics; Business; Costs; Data analysis; Databases; Information management; Proteins; Sequences; Warehousing; Computer Systems; Costs and Cost Analysis; Database Management Systems; Databases as Topic; Humans; Information Management; Information Services; Information Systems; Science; Systems Integration;
fLanguage :
English
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
Publisher :
ieee
ISSN :
1089-7771
Type :
jour
DOI :
10.1109/4233.826859
Filename :
826859
Link To Document :
بازگشت