• DocumentCode
    892780
  • Title

    A Software Technique for Diagnosing and Correcting Memory Errors

  • Author

    Liss, J.

  • Author_Institution
    IBM, Kingston
  • Volume
    35
  • Issue
    1
  • fYear
    1986
  • fDate
    4/1/1986 12:00:00 AM
  • Firstpage
    12
  • Lastpage
    18
  • Abstract
    A software diagnostic that eliminates 2-bit and some 3-bit errors is described. The diagnostic procedure tests memory for errors that cannot be corrected by ECC (error correcting code): single error correct, double error detect. When an uncorrectable error is found, the diagnostic attempts to reduce it to a I-bit error. This is done either by reconfiguring the memory to distribute failing bits across different ECC words or by replacing the failing chip with a spare. The result is that memory cards that previously had to be replaced can now continue to function. Thus, the life of memory cards can be prolonged. The diagnostic can also perform preventive maintenance when run in an alternate mode. In this mode, all combinations of the memory are tested to determine if there is reserve. Reserve is defined as: 1) The capability of reconfiguring the card to obtain another functional state of memory (in addition to the current operational state), or 2) The availability of functional spare chips that have not been used. Preventive maintenance is by replacing cards that have no reserve. Then, memory operation can continue error free.
  • Keywords
    Circuits; Costs; Error correction; Error correction codes; Failure analysis; Hardware; Preventive maintenance; Reliability engineering; Software design; Testing;
  • fLanguage
    English
  • Journal_Title
    Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9529
  • Type

    jour

  • DOI
    10.1109/TR.1986.4335331
  • Filename
    4335331