Abstract :
Summary form only given. The newly emerging many-core-on-a-chip designs have renewed an intense interest in parallel processing. By applying Amdahl´s formulation to the programs in the PARSEC and SPLASH-2 benchmark suites, we find that most applications may not have sufficient parallelism to efficiently utilize modern parallel machines. The long sequential portions in these application programs are caused by computation as well as communication latency. However, value prediction techniques may allow the “parallelization” of the sequential portion by predicting values before they are produced. In conventional superscalar architectures, the computation latency dominates the sequential sections. Thus value prediction techniques may be used to predict the computation result before it is produced. In many-core architectures, since the communication latency increases with the number of cores, value prediction techniques may be used to reduce both the communication and computation latency. We extend these ideas by using GPUs to accelerate programs that contain limited parallelism and those that are hard to parallelize.