Title :
On the scalability of image and signal processing parallel applications on emerging cc-NUMA many-cores
Author :
Almaless, Ghassan ; Wajsburt, Franck
Author_Institution :
LIP6, UPMC, Paris, France
Abstract :
Nowadays, single-chip cache-coherent multi-cores up to 100 cores are a reality and many-cores of hundreds of cores are planned in the near future. This technological shift undertaking by the high-end computer-industry is converging with the design motivation of other domains like embedded and HPC industries. In this paper, we propose to investigate the scalability of the same four unmodified, shared-memory, image and signal processing oriented parallel applications on two targets: (i) embedded - TSAR, a single-chip 256-cores based, Cycle-Accurate-Bit-Accurate simulated, cc-NUMA many-core; and (ii) high-end - an AMD Opteron Interlagos, 64-core based, cc-NUMA many-core. Beside our scalability results on both cc-NUMA targets, our contributions include two operating system mechanisms: (i) a distributed, client/server based, scheduler design allowing the kernel to offer scalable inter-threads synchronization mechanisms; and (ii) a kernel-level memory affinity technique named Auto-Next-Touch allowing the kernel to transparently and automatically migrate physical pages in order to enforce the locality of thread´s memory accesses. Although these two mechanisms are implemented and evaluated in ALMOS (Advanced Locality Management Operating System) running on the TSAR target, they remain applicable to other shared-memory operating systems.
Keywords :
image processing; multi-threading; operating systems (computers); shared memory systems; ALMOS; AMD Opteron Interlagos; CC-NUMA many-cores; HPC industries; advanced locality management operating system; auto-next-touch; cycle-accurate-bit-accurate simulation; embedded TSAR; high-end computer-industry; image processing parallel applications; interthreads synchronization mechanisms scalability; kernel-level memory affinity technique; scheduler design; shared-memory applications; signal processing oriented parallel applications; single-chip 256-cores; single-chip cache-coherent multicores; thread memory accesses locality; Instruction sets; Kernel; Linux; Resource management; Scalability; Servers;
Conference_Titel :
Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on
Conference_Location :
Karlsruhe
Print_ISBN :
978-1-4673-2089-4
Electronic_ISBN :
978-2-9539987-4-0