Title :
Architectural Considerations for Efficient Software Execution on Parallel Microprocessors
Author :
Vadlamani, Srinivas ; Jenks, Stephen
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., California Univ., Irvine, CA
Abstract :
Chip multiprocessors (CMPs) and simultaneous multithreading (SMT) processors provide high performance but put more pressure on the memory interface than their single-thread counterparts. The "memory wall" problem is exacerbated by multiple threads sharing a memory interface, and will get worse as more cores are added. Therefore, communications between cores, using shared caches or fast interconnects between private caches, are needed to keep the CPUs busy without burdening the memory interface. Multiple CMP systems add another dimension to this challenging problem, as the communication mechanism is no longer uniform. To parallelize data-intensive applications for high performance on these systems, one must explore a number of execution behaviors in a complex architecture-dependent exercise that entails identifying key components of the communication subsystem and understanding their behavior under varying workloads. As part of ongoing research into efficient program execution models for parallel microprocessors, we have developed a tool to evaluate the performance of the storage controllers at different levels of the memory hierarchy under varying workloads and measure cache coherence overhead. The tool allows exploration of architectural features of real processors that affect the performance of several parallel execution approaches. Here, we demonstrate its use by evaluating two of our parallel programming models that employ architecture-specific optimizations and compare them to a conventional model for several applications on parallel microprocessors.
Keywords :
cache storage; microprocessor chips; multi-threading; parallel architectures; SMT processors; architecture-specific optimizations; cache coherance; chip multiprocessors; data-intensive applications; memory interface; memory wall problem; multiple CMP systems; parallel microprocessors; parallel programming models; simultaneous multithreading processor; software execution; Cache storage; Communication system control; Computer science; Finite difference methods; Microprocessors; Multithreading; Software performance; Surface-mount technology; Time domain analysis; Yarn;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location :
Long Beach, CA
Print_ISBN :
1-4244-0910-1
Electronic_ISBN :
1-4244-0910-1
DOI :
10.1109/IPDPS.2007.370294