• DocumentCode
    3077611
  • Title

    Confuga: Scalable Data Intensive Computing for POSIX Workflows

  • Author

    Donnelly, Patrick ; Hazekamp, Nicholas ; Thain, Douglas

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Notre Dame, Notre Dame, ID, USA
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    392
  • Lastpage
    401
  • Abstract
    Today´s big-data analysis systems achieve performance and scalability by requiring end users to embrace a novel programming model. This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends on complex applications written for the conventional POSIX environment. To address this gap, we introduce Conjugal, a scalable data-intensive computing system that is largely compatible with the POSIX environment. Conjugal brings together the workflow model of scientific computing with the storage architecture of other big data systems. Conjugal accepts large workflows of standard POSIX applications arranged into graphs, and then executes them in a cluster, exploiting both parallelism and data-locality. By making use of the workload structure, Conjugal is able to avoid the long-standing problems of metadata scalability and load instability found in many large scale computing and storage systems. We show that CompUSA´s approach to load control offers improvements of up to 228% in cluster network utilization and 23% reductions in workflow execution time.
  • Keywords
    Big Data; natural sciences computing; operating systems (computers); parallel processing; storage management; Big Data systems; Confuga; POSIX workflows; active storage cluster file system; data-intensive computing system; data-locality; parallelism; scientific computing; storage architecture; Bioinformatics; Chirp; Computer architecture; Genomics; Protocols; Semantics; Servers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.95
  • Filename
    7152505