• DocumentCode
    3149090
  • Title

    Area failures and reliable distributed applications

  • Author

    Nakechbandi, Moustafa ; Colin, Jean-yves

  • Author_Institution
    LITIS Lab., Le Havre Univ., Le Havre, France
  • fYear
    2009
  • fDate
    14-16 Dec. 2009
  • Firstpage
    79
  • Lastpage
    85
  • Abstract
    Because fault failures tend to affect whole areas, in some cases, and not only individual computers, we propose a new, efficient scheduling algorithm for problems in which tasks with precedence constraints and communication delays have to be scheduled on a virtual heterogeneous distributed multi areas system subject to the possibility of one complete area failure. Based on an extension of the critical-path method CPM/PERT, our algorithm combines an optimal schedule when there is no failures, with some tasks duplication to provide fault-tolerance in the case of the failure of one area. Backup copies are not established for tasks that have already more than one original copy in different areas. The result is a schedule in polynomial time that is optimal when there is no area failure, and is a good reliable schedule in the case of any one area failure. We finally do some numerical experiments in which we use our algorithm on several semi-random DAGs and compare the optimal solutions with the reliable solutions found by this algorithm.
  • Keywords
    directed graphs; distributed processing; fault tolerant computing; scheduling; CPM; PERT; area failure; critical-path method; directed acyclic graph; fault failure; scheduling algorithm; semirandom DAG; virtual heterogeneous distributed multiareas system; Application software; Computer crashes; Computer hacking; Delay; Distributed computing; Fault tolerance; Optimal scheduling; Polynomials; Processor scheduling; Scheduling algorithm; DAG; area failure; catastrophic crash; fault tolerance; heterogeneous systems; reliable applications; scheduling with communication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering & Systems, 2009. ICCES 2009. International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-5842-4
  • Electronic_ISBN
    978-1-4244-5843-1
  • Type

    conf

  • DOI
    10.1109/ICCES.2009.5383307
  • Filename
    5383307