• DocumentCode
    2642676
  • Title

    Highly fault-tolerant parallel computation

  • Author

    Spielman, Daniel A.

  • Author_Institution
    Dept. of Math., MIT, Cambridge, MA, USA
  • fYear
    1996
  • fDate
    14-16 Oct 1996
  • Firstpage
    154
  • Lastpage
    163
  • Abstract
    We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using wlog0(1)w processors and time tlog0(1)w. The failure probability of the computation will be at most t·exp(-w¼). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(nlog0(1)n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead
  • Keywords
    Reed-Solomon codes; fault tolerant computing; parallel algorithms; coded computation; coded model; error-correcting code; failure probability; fault-tolerant computation; fault-tolerant parallel computation; generalized Reed-Solomon codes; parallel computation; Circuit faults; Circuit simulation; Computational modeling; Concurrent computing; Decoding; Error correction codes; Fault tolerance; Mathematics; Protection; Reed-Solomon codes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on
  • Conference_Location
    Burlington, VT
  • ISSN
    0272-5428
  • Print_ISBN
    0-8186-7594-2
  • Type

    conf

  • DOI
    10.1109/SFCS.1996.548474
  • Filename
    548474