Title :
Exploring the use of Hyper-Threading technology for multimedia applications with Intel® OpenMP compiler
Author :
Tian, Xinmin ; Chen, Yen-Kuang ; Girkar, Milind ; Ge, Steven ; Lienhart, Rainer ; Shah, Sanjiv
Abstract :
Processors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. However, the potential performance improvement can be only obtained if an application is multithreaded by parallelization techniques. This paper presents the threaded code generation and optimization techniques in the Intel® C++/Fortran compiler. We conduct the performance study of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel compiler on the Hyper-Threading technology (HT) enabled Intel single-processor and multi-processor systems. Our performance results show that the multithreaded code generated by the Intel compiler achieved up to 1.28x speedups on a HT-enabled single-CPU system and up to 2.23x speedup on a HT-enabled dual-CPU system. By measuring IPC (Instructions Per Cycle), UPC (Uops Per Cycle) and cache misses of both serial and multithreaded execution of each multimedia application, we conclude three key observations: (a) the multithreaded code generated by the Intel compiler yields a good performance gain with the parallelization guided by OpenMP pragmas or directives; (b) exploiting thread-level parallelism (TLP) causes inter-thread interference in caches, and places greater demands on the memory system. However, the Hyper-Threading technology hides the additional latency, so that there is a small impact on the whole program performance; (c) Hyper-Threading technology is effective on exploiting both task- and data-parallelism inherent in multimedia applications.
Keywords :
C++ language; FORTRAN; cache storage; microprocessor chips; multi-threading; multimedia computing; optimising compilers; parallelising compilers; performance evaluation; C++/Fortran compiler; Hyper-Threading technology; IPC; Instructions Per Cycle; Intel OpenMP compiler; Intel single-processor systems; OpenMP directives; OpenMP pragmas; TLP; UPC; Uops Per Cycle; cache misses; data-parallelism; inter-thread interference; latency; multi-processor systems; multimedia applications; optimization techniques; parallelization techniques; performance; speedups; task-parallelism; thread-level parallelism; threaded code generation; Application software; Delay; Microprocessors; Multimedia systems; Optimizing compilers; Performance gain; Software performance; Support vector machines; Surface-mount technology; Yarn;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
Print_ISBN :
0-7695-1926-1
DOI :
10.1109/IPDPS.2003.1213118