• DocumentCode
    127584
  • Title

    Addressing the Shimming Problem in Big Data Scientific Workflows

  • Author

    Mohan, Archith ; Shiyong Lu ; Kotov, Alexander

  • Author_Institution
    Wayne State Univ., Detroit, MI, USA
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    347
  • Lastpage
    354
  • Abstract
    Substantial amount of research has been done recently to address the shimming problem in scientific workflows, in which a special kind of adaptors, called shims, are inserted between workflow tasks to resolve the data type incompatibility issue. Recently, scientific workflows are increasingly used for big data analysis and processing, which poses additional challenges, such as volume, velocity and variety of data to the shimming problem. One issue is to scale the registration and configuration procedure to a large number of workflow tasks. Another issue is the ease of integrating a large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in usability and scalability in addressing these issues. In this paper we 1) propose a new simplified single-component based task model based on extensive experiences and lessons learned from our original multiple-component based task model. The new model separates registration from configuration and eases the process of registering external functional components (such as Web services) into p-workflows, 2) propose a shim generation algorithm that elegantly solves the shimming problem raised by Web service based scientific workflows, and 3) we integrate MongoDB, a NoSQL document-oriented database system for storing and managing large-scale unstructured documents. A new version of the DATAVIEW system has been developed to support the proposed techniques and a case study has been conducted to show the feasibility and usability of our proposed techniques.
  • Keywords
    Big Data; SQL; Web services; data analysis; relational databases; Big Data analysis; Big Data scientific workflows; DATAVIEW system; MongoDB; NoSQL document-oriented database system; Web services; data type incompatibility; heterogeneous task components; shimming problem; Big data; Data models; Databases; Ports (Computers); Registers; Web services; XML; big data; scientific workflow; shimming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing (SCC), 2014 IEEE International Conference on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5065-2
  • Type

    conf

  • DOI
    10.1109/SCC.2014.53
  • Filename
    6930553