• DocumentCode
    2572494
  • Title

    A Task-Based Fault-Tolerance Mechanism to Hierarchical Master/Worker with Divisible Tasks

  • Author

    Dai, Zhihui ; Viale, Fabien ; Chi, Xuebin ; Caromel, Denis ; Lu, Zhonghua

  • Author_Institution
    Comput. Network & Inf. Center, China Acad. of Sci., Beijing, China
  • fYear
    2009
  • fDate
    25-27 June 2009
  • Firstpage
    672
  • Lastpage
    677
  • Abstract
    The master/worker API of the ProActive middleware provides with an easy way to use framework for parallelizing embarrassingly parallel applications. However, the traditional master/worker model faces great challenges as the development of the scalability of the distributed computing. A single-layer hierarchical master/worker has been implemented as a solution to the scalability issues of the MW API. In the new framework, the mainmaster only communicates with some submasters, and each submaster manages a set of workers. A ldquobully election algorithmrdquo and an ldquoobject discovery mechanismrdquo are implemented to solve the fault-tolerance problems of the submasters. An automatic load-balancing mechanism is implemented for the hierarchical master/worker to solve divisible tasks. Moreover, an optimization has been done to make the fault-tolerance mechanism more efficient.
  • Keywords
    fault tolerant computing; middleware; parallel processing; resource allocation; API; ProActive middleware; automatic load-balancing mechanism; bully election algorithm; distributed computing scalability; divisible tasks; object discovery mechanism; parallel applications; single-layer hierarchical master-worker; task-based fault-tolerance mechanism; Computer networks; Distributed computing; Fault tolerance; High performance computing; Java; Libraries; Middleware; Nominations and elections; Parallel programming; Scalability; ProActive; divisible task; fault-tolerance; hierarchical master/worker; load-balancing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-4600-1
  • Electronic_ISBN
    978-0-7695-3738-2
  • Type

    conf

  • DOI
    10.1109/HPCC.2009.35
  • Filename
    5167062