A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems

Author

Zhenjiang Dong ; Jun Wang ; Riley, George F. ; Yalamanchili, Sudhakar

Author_Institution

Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA

fYear

2013

fDate

14-16 Aug. 2013

Firstpage

375

Lastpage

379

Abstract

There has been little research that studies the effect of partitioning on parallel simulation of multicore systems. This paper presents our study of this important problem in the context of Null-message-based synchronization algorithm for parallel multicore simulation. This paper focuses on coarse grain parallel simulation where each core and its cache slices are modeled within a single logical process (LP) and different partitioning schemes are only applied to the interconnection network. In this paper we show that encapsulating the entire on-chip interconnection network into a single logical process is an impediment to scalable simulation. This baseline partitioning and two other schemes are investigated. Experiments are conducted on a subset of the PARSEC benchmarks with 16-, 32-, 64- and 128-core models. Results show that the partitioning scheme has a significant impact on simulation performance and parallel efficiency. Beyond a certain system scale, one scheme consistently outperforms the other two schemes, and the performance as well as efficiency gaps increases as the size of the model increases - with up to 4.1 times faster speed and 277% better efficiency for 128-core models. We explain the reasons for this behavior, which can be traced to the features of the Null-message-based synchronization algorithm. Because of this, we believe that, if a component has increasing number of inter-LP interactions with increasing system size, such components should be partitioned into several sub-components to achieve better performance.

Keywords

cache storage; discrete event simulation; multiprocessing systems; multiprocessor interconnection networks; parallel architectures; performance evaluation; synchronisation; 128-core model; 16-core model; 32-core model; 64-core model; PARSEC benchmarks; baseline partitioning; cache slices; coarse grain parallel simulation; interLP interactions; logical process; null-message-based synchronization algorithm; on-chip interconnection network; parallel efficiency; parallel multicore simulation; Benchmark testing; Computational modeling; Manifolds; Multicore processing; Multiprocessor interconnection; Partitioning algorithms; Synchronization; multicore system; null message algorithm; parallel simulation; partitioning;

fLanguage

English

Publisher

ieee

Conference_Titel

Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 IEEE 21st International Symposium on

Conference_Location

San Francisco, CA

ISSN

1526-7539

Type

conf

DOI

10.1109/MASCOTS.2013.55

Filename

6730790