DocumentCode :
2552677
Title :
Enabling information integration and workflows in a grid environment with automatic wrapper generation
Author :
Zhang, Xuan ; Agrawal, Gagan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2005
fDate :
13-14 Nov. 2005
Abstract :
With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. One critical challenge in this, however, is that data sources often hold the same type of data in a number of different formats, and also, the formats expected and generated by various data analysis services are often distinct. We believe that the traditional approach for dealing with this problem, which is using hand-written wrappers, is not an effective and scalable solution for a grid environment. This paper presents a new approach, which involves generating wrappers automatically for enabling grid-based information integration and workflows. In this approach, a layout descriptor is used for describing the data format for each data source, as well as the input and output format for each tool or service. Efficient wrappers are then generated automatically for translation between any two data formats. Our design separates wrapper generation service from the wrapper execution. The wrapper generation service analyzes the layout descriptors and generates a WRAPINFO data structure. The wrapper comprises a set of application independent modules which take the WRAPINFO data structure as the input. We demonstrate our wrapper generation tool with two real case studies. Besides showing the effectiveness of our system, the experiments results from these two case studies show that the wrapper generation overhead is very small, automatically generated wrappers scale well to large datasets, and for the one case where this comparison was possible, the execution time of our wrapper was within 30% of that of a hand-written one.
Keywords :
data analysis; data structures; grid computing; natural sciences computing; WRAPINFO data structure; automatic wrapper generation; data analysis services; data formats; grid environment; grid-based data repositories; grid-based workflows; information integration; layout descriptors; multiple data source access; scientific data analysis; Bioinformatics; Computer science; Data analysis; Data engineering; Data structures; Databases; Grid computing; Mesh generation; Project management; Resource management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Grid Computing, 2005. The 6th IEEE/ACM International Workshop on
Print_ISBN :
0-7803-9492-5
Type :
conf
DOI :
10.1109/GRID.2005.1542737
Filename :
1542737
Link To Document :
بازگشت