• DocumentCode
    3384532
  • Title

    Component Based Proactive Fault Tolerant Scheduling in Computational Grid

  • Author

    Haider, Sajjad ; Imran, Muhammad ; Niaz, Iftikhar Azim ; Ullah, Saeed ; Ansari, M.A.

  • Author_Institution
    Inf. Technol. Dept., Nat. Univ. of Modern Languages, Islamabad
  • fYear
    2007
  • fDate
    12-13 Nov. 2007
  • Firstpage
    119
  • Lastpage
    124
  • Abstract
    Computational Grids have the capability to provide the main execution platform for high performance distributed applications. Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media are extremely complex and prone to different kinds of errors, failures and faults. Grid is a layered architecture and most of the fault tolerant techniques developed on grids use its strict layering approach. In this paper, we have proposed a cross-layer design for handling faults proactively. In a cross-layer design, the top- down and bottom-up approach is not strictly followed, and a middle layer can communicate with the layer below or above it [1]. At each grid layer there would be a monitoring component that would decide on predefined factors that the reliability of that particular layer is high, medium or low. Based on Hardware Reliability Rating (HRR) and Software Reliability Rating (SRR), the Middleware Monitoring Component / Cross- Layered Component (MMC/CLC) would generate a Combined Rating (CR) using CR calculation matrix rules. Each grid participating node will have a CR value generated through cross layered communication using HMC, MMC/CLC and SMC. All grid nodes will have their CR information in the form of a CR table and high rated machines would be selected for job execution on the basis of minimum CPU load along with different intensities of check pointing. Handling faults proactively at each layer of grid using cross communication model would result in overall improved dependability and increased performance with less overheads of check pointing.
  • Keywords
    checkpointing; error handling; fault tolerant computing; grid computing; middleware; object-oriented programming; scheduling; system monitoring; check pointing; combined rating table; computational grid; cross-layered component; hardware reliability rating; high performance distributed application; matrix rule; middleware monitoring component; proactive fault tolerant scheduling; software reliability rating; Chromium; Computer architecture; Cross layer design; Distributed computing; Fault tolerance; Grid computing; High performance computing; Monitoring; Nonhomogeneous media; Processor scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Technologies, 2007. ICET 2007. International Conference on
  • Conference_Location
    Islamabad
  • Print_ISBN
    978-1-4244-1493-2
  • Electronic_ISBN
    978-1-4244-1494-9
  • Type

    conf

  • DOI
    10.1109/ICET.2007.4516328
  • Filename
    4516328