DocumentCode :
1190439
Title :
A concurrent test architecture for massively parallel computers and its error detection capability
Author :
Hancu, Marius V A ; Iwasaki, Kazuhiko ; Sato, Yuji ; Sugie, Mamoru
Author_Institution :
Centre de Recherche Inf. de Montreal, Que., Canada
Volume :
5
Issue :
11
fYear :
1994
fDate :
11/1/1994 12:00:00 AM
Firstpage :
1169
Lastpage :
1184
Abstract :
Presents new principles for online monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, concurrent testing (or online monitoring) at the system level is accomplished by enforcing the run-time testing of the data and control dependences of the algorithm currently being executed on the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination, the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PEs is also possible. As a result of this procedure, an image of the data dependences and of the control flow of the currently running algorithm is created. This image is compared, at the end of each computational block, with a reference image created at compilation time. The main results of this work are in proposing new principles for the online system-level testing of multiprocessor systems, based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for monitoring the data routing process
Keywords :
computer testing; error detection; network routing; parallel machines; probability; aliasing probability; block compressed sequence; compilation; computational block; concurrent instruction compression; concurrent test architecture; control dependences; control flow checking; data dependences; data routing process; error detection; massively parallel computers; message destination address; message source address; multiprocessors; online monitoring; online system-level testing; packet-switched routing; reference image; run-time testing; signature analysis; system level monitoring; Analytical models; Computer architecture; Computer errors; Computerized monitoring; Concurrent computing; Control systems; Image coding; Multiprocessing systems; Runtime; System testing;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.329671
Filename :
329671
Link To Document :
بازگشت