• DocumentCode
    147782
  • Title

    Automatic checkpointing based fault tolerance in computational grid

  • Author

    Babu, Ch Ratna ; Rao, C. D. V. Subba

  • Author_Institution
    Dept. of CSE, JNTU Kakinada, Kakinada, India
  • fYear
    2014
  • fDate
    27-29 April 2014
  • Firstpage
    41
  • Lastpage
    45
  • Abstract
    Although technology changes quick still more sophisticated computational techniques are needed to preserve them. The majority of the computational grids work-load-logs show that node or job failure is the major challenging task to deal with. Since very robust scheduling algorithms are used to handle varied resource allocation in computational grids. However there is a need in previous studies to remedy the failures and delay of executing jobs with respect to resource availability, which can handle both scheduling and efficient failure handling in any large scale high performance computational applications. Consequently the major issues concerned here is fault-tolerance to tolerate failures with regard to job scheduling and efficient failure handling mechanism. So synchronization is needed to embed both techniques. Recurrently using techniques for fault tolerance in the widely held computational applications are periodic job checkpointing and replication. Hence most of the job checkpointing techniques are not merely based on scheduling algorithm. This work presents an automated checkpointing strategy in computational grid based on different scheduling algorithms. Experimental results have shown that the proposed automated checkpointing of jobs based on fault tolerant scheduling strategy has got considerable improvement over conventional adaptive checkpointing algorithms.
  • Keywords
    checkpointing; grid computing; parallel processing; resource allocation; scheduling; software fault tolerance; synchronisation; adaptive checkpointing algorithm; automated checkpointing strategy; automatic checkpointing based fault tolerance; computational grid; computational techniques; failure handling mechanism; fault tolerant scheduling strategy; high performance computational application; job checkpointing techniques; job failure; job scheduling; periodic job checkpointing; resource allocation; resource availability; scheduling algorithm; synchronization; Checkpointing; Fault tolerance; Fault tolerant systems; Kernel; Scheduling algorithms; Torque; Automatic checkpointing; Computational Grid; job scheduling; node failure; replication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing, Management and Telecommunications (ComManTel), 2014 International Conference on
  • Conference_Location
    Da Nang
  • Print_ISBN
    978-1-4799-2904-7
  • Type

    conf

  • DOI
    10.1109/ComManTel.2014.6825575
  • Filename
    6825575