• DocumentCode
    446837
  • Title

    COMA: An Opportunity for Building Fault-Tolerant Scalable Shared Memory Multiprocessors

  • Author

    Gefflaut, Alain ; Banatre, M. ; Kermarrec, Anne-Marie ; Morin, Christine

  • fYear
    1996
  • fDate
    22-24 May 1996
  • Firstpage
    56
  • Lastpage
    56
  • Abstract
    Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if they must be used for long-running computations. In this paper, we show that the class of Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant SSMMs. A backward error recovery strategy can be implemented without significant hardware modification to previously proposed COMA by exploiting their standard replication mechanisms and extending the coherence protocol to transparently manage recovery data. Evaluation of the proposed fault-tolerant COMA is based on execution driven simulations using some of the Splash applications. We show that, for the simulated architecture, the performance degradation caused by fault-tolerance mechanisms varies from 5% in the best case to 35% in the worst case. The standard memory behavior is only slightly perturbed. Moreover, results also show that the proposed scheme preserves the architecture scalability and that the memory overhead remains low for parallel applications using mostly shared data.
  • Keywords
    Scalable Shared Memory Multiprocessors; backward error recovery; coherence protocol; fault-tolerance; Bioinformatics; Bit error rate; Buildings; Degradation; Fault tolerance; Genomics; Hardware; Memory architecture; Memory management; Permission; Scalable Shared Memory Multiprocessors; backward error recovery; coherence protocol; fault-tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 1996 23rd Annual International Symposium on
  • ISSN
    1063-6897
  • Print_ISBN
    0-89791-786-3
  • Type

    conf

  • DOI
    10.1109/ISCA.1996.10022
  • Filename
    1563035