DocumentCode
2187746
Title
Logical Optimization of Dataflows for Data Mining and Integration Processes
Author
Wöhrer, Alexander ; Mehofer, Eduard ; Brezany, Peter
Author_Institution
Dept. of Sci. Comput., Univ. of Vienna, Vienna, Austria
fYear
2010
fDate
7-10 Dec. 2010
Firstpage
117
Lastpage
122
Abstract
Modern scientific collaborations require large-scale data mining and integration processes. Their investigations involve multi-disciplinary expertise and large-scale computational experiments on top of large amounts of data that are located in distributed data repositories running various software systems, and managed by different organizations. Higher-level dataflow languages are used on top of parallel dataflow systems to enable faster program development and more maintainable code. Logical and physical optimization should be applied prior to its execution to improve performance. In this paper we present the rationale, theory, design and application of logical optimization of data flows for data mining and integration processes. A dataflow model is defined and several optimization algorithms, namely dead elements elimination, process re-ordering, parallelization, and data by-passing are developed. The first research prototype of the framework has been implemented in the context of the ADMIRE Data Mining and Integration Process Designer for logical optimization of specifications expressed in the DISPEL language developed in the ADMIRE project.
Keywords
data flow analysis; data mining; high level languages; optimisation; software engineering; ADMIRE data mining; DISPEL language; data by passing; dead element elimination; distributed data repository; higher level dataflow language; integration process designer; large scale computational experiment; logical optimization; multidisciplinary expertise; parallel dataflow system; process reordering; scientific collaboration; software system; Adaptation model; Computational modeling; Data mining; Data models; Distributed databases; Optimization; Process control; data-intensive research; dataflows; logical optimization;
fLanguage
English
Publisher
ieee
Conference_Titel
e-Science Workshops, 2010 Sixth IEEE International Conference on
Conference_Location
Brisbane, QLD
Print_ISBN
978-1-4244-8988-6
Electronic_ISBN
978-0-7695-4295-9
Type
conf
DOI
10.1109/eScienceW.2010.28
Filename
5693151
Link To Document