DocumentCode :
1151384
Title :
A highly OR-parallel inference machine (Multi-ASCA) and its performance evaluation: an architecture and its load balancing algorithms
Author :
Naganuma, Jiro ; Ogura, Takeshi
Author_Institution :
NTT LSI Labs., Kanagawa, Japan
Volume :
43
Issue :
9
fYear :
1994
fDate :
9/1/1994 12:00:00 AM
Firstpage :
1062
Lastpage :
1075
Abstract :
An architecture and its four load balancing algorithms for a highly OR-parallel inference machine are proposed, and its performance is evaluated in a trace-driven simulation study. This inference machine consists of a large number of processing elements (PEs) with serial I/O links directly connected to each other in a simply modified mesh network. Each PE is a high-speed sequential Prolog processor with its own local memory. The activity of all PEs is locally controlled by four new load balancing algorithms based on purely local communication. Communication is allowed only between directly connected PEs. These load balancing algorithms reduce communication overhead in a load balancing and make it possible to accomplish highly OR-parallel execution. A software simulator using a trace-driven simulation technique based on an inference tree has been developed, and some typical OR-parallel benchmarks such as the n-queens problem have been simulated on it. The average communication per load balancing is reduced by a factor ranging from 1/30 to 1/100 by the interaction of these load balancing algorithms as compared with a conventional copying method. The inference machine (1024 PEs; 32×32 array) attains 300-600 times parallel speedup, assuming 1 MLIPS (mega logical inferences per second) PE and a 20 MBPS (mega bits per second) each serial I/O link, which could be easily integrated on a single chip using current VLSI technology. This highly OR-parallel inference machine promises to be an important step towards the realization of a high-performance artificial intelligence system
Keywords :
PROLOG; inference mechanisms; parallel architectures; parallel machines; performance evaluation; resource allocation; virtual machines; 20 Mbit/s; Multi-ASCA; OR-parallel benchmarks; VLSI; communication overhead; copying method; high-performance artificial intelligence system; high-speed sequential Prolog processor; highly OR-parallel inference machine; inference tree; load balancing algorithms; local communication; local memory; locally controlled activity; modified mesh network; n-queens problem; nonshared memory multiprocessor system; parallel architecture; performance evaluation; processing elements; serial I/O links; software simulator; trace-driven simulation; Artificial intelligence; Communication system control; Inference algorithms; Lips; Load management; Logic arrays; Logic programming; Mesh networks; Multiprocessing systems; Very large scale integration;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.312115
Filename :
312115
Link To Document :
بازگشت