• DocumentCode
    659435
  • Title

    Adaptive file management for scientific workflows on the Azure cloud

  • Author

    Tudoran, Radu ; Costan, Alexandru ; Rad, Ramin Rezai ; Brasche, Goetz ; Antoniu, Gabriel

  • Author_Institution
    Microsoft Res.-Inria Joint Centre, Palaiseau, France
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    273
  • Lastpage
    281
  • Abstract
    Scientific workflows typically communicate data between tasks using files. Currently, on public clouds, this is achieved by using the cloud storage services, which are unable to exploit the workflow semantics and are subject to low throughput and high latencies. To overcome these limitations, we propose an alternative leveraging data locality through direct file transfers between the compute nodes. We rely on the observation that workflows generate a set of common data access patterns that our solution exploits in conjunction with context information to self-adapt, choose the most adequate transfer protocol and expose the data layout within the virtual machines to the workflow engines. This file management system was integrated within the Microsoft Generic Worker workflow engine and was validated using synthetic benchmarks and a real-life application on the Azure cloud. The results show it can bring significant performance gains: up to 5x file transfer speedup compared to solutions based on standard cloud storage and over 25% application timespan reduction compared to Hadoop on Azure.
  • Keywords
    cloud computing; file organisation; natural sciences computing; virtual machines; Azure cloud; Hadoop; Microsoft Generic Worker workflow engine; adaptive file management; application timespan reduction; cloud storage services; common data access patterns; compute nodes; data layout; data locality; direct file transfers; file management system; file transfer speedup; scientific workflows; standard cloud storage; synthetic benchmarks; transfer protocol; virtual machines; Cloud computing; Context; Data transfer; Engines; Programming; Protocols; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691584
  • Filename
    6691584