• DocumentCode
    2805987
  • Title

    High performance fault tolerant computer and its fault recovery

  • Author

    Nakamikawa, Tetsuaki ; Morita, Yuuichirou ; Yamaguchi, Shinichirou ; Ishikawa, Sakou ; Miyazaki, Yoshihiro

  • Author_Institution
    Hitachi Res. Lab., Japan
  • fYear
    1997
  • fDate
    15-16 Dec 1997
  • Firstpage
    2
  • Lastpage
    6
  • Abstract
    The authors proposed a new architecture for an FTC called QPR (Quad Processor Redundancy) in which duplicated CPUs operate under a hardware lock step, and duplicated I/Os are managed by software. A dual system bus combines two duplicated areas. After recovery from a fault, it is necessary to resynchronize the system, so the contents of the main memory must be copied from the normal CPU to the other CPU. The overhead for copying must be small, so that the normal CPU can still continue the application. They describe a fault recovery method especially for a memory copying method. When a memory access has occurred, the memory interface unit snoops the data and sends them to another CPU using the dual system bus. They measured copy time using the real machine and simulated the copy overhead under a heavy DMA load. They obtained a small overhead and small load dependency
  • Keywords
    computer architecture; fault tolerant computing; input-output programs; redundancy; reliability; synchronisation; system buses; system recovery; virtual machines; QPR; Quad Processor Redundancy; architecture; copy overhead simulation; copy time measurement; data snooping; dual system bus; duplicated CPUs; duplicated I/Os; duplicated areas; fault recovery; hardware lock step; heavy DMA load; high performance fault tolerant computer; load dependency; main memory contents copying; memory access; memory interface unit; resynchronization; Application software; Central Processing Unit; Computer architecture; Fault tolerance; Hardware; High performance computing; Laboratories; Operating systems; Redundancy; System buses;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Systems, 1997. Proceedings., Pacific Rim International Symposium on
  • Conference_Location
    Taipei
  • Print_ISBN
    0-8186-8212-4
  • Type

    conf

  • DOI
    10.1109/PRFTS.1997.640117
  • Filename
    640117