مرکز منطقه ای اطلاع رساني علوم و فناوري - On the scalability of image and signal processing parallel applications on emerging cc-NUMA many-cores

DocumentCode :

580497

Title :

On the scalability of image and signal processing parallel applications on emerging cc-NUMA many-cores

Author :

Almaless, Ghassan ; Wajsburt, Franck

Author_Institution :

LIP6, UPMC, Paris, France

fYear :

2012

fDate :

23-25 Oct. 2012

Firstpage :

Lastpage :

Abstract :

Nowadays, single-chip cache-coherent multi-cores up to 100 cores are a reality and many-cores of hundreds of cores are planned in the near future. This technological shift undertaking by the high-end computer-industry is converging with the design motivation of other domains like embedded and HPC industries. In this paper, we propose to investigate the scalability of the same four unmodified, shared-memory, image and signal processing oriented parallel applications on two targets: (i) embedded - TSAR, a single-chip 256-cores based, Cycle-Accurate-Bit-Accurate simulated, cc-NUMA many-core; and (ii) high-end - an AMD Opteron Interlagos, 64-core based, cc-NUMA many-core. Beside our scalability results on both cc-NUMA targets, our contributions include two operating system mechanisms: (i) a distributed, client/server based, scheduler design allowing the kernel to offer scalable inter-threads synchronization mechanisms; and (ii) a kernel-level memory affinity technique named Auto-Next-Touch allowing the kernel to transparently and automatically migrate physical pages in order to enforce the locality of thread´s memory accesses. Although these two mechanisms are implemented and evaluated in ALMOS (Advanced Locality Management Operating System) running on the TSAR target, they remain applicable to other shared-memory operating systems.

Keywords :

image processing; multi-threading; operating systems (computers); shared memory systems; ALMOS; AMD Opteron Interlagos; CC-NUMA many-cores; HPC industries; advanced locality management operating system; auto-next-touch; cycle-accurate-bit-accurate simulation; embedded TSAR; high-end computer-industry; image processing parallel applications; interthreads synchronization mechanisms scalability; kernel-level memory affinity technique; scheduler design; shared-memory applications; signal processing oriented parallel applications; single-chip 256-cores; single-chip cache-coherent multicores; thread memory accesses locality; Instruction sets; Kernel; Linux; Resource management; Scalability; Servers;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on

Conference_Location :

Karlsruhe

Print_ISBN :

978-1-4673-2089-4

Electronic_ISBN :

978-2-9539987-4-0

Type :

conf

Filename :

6385369

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=580497