DocumentCode :
3716765
Title :
AnkaCom: A Development and Experiment for Extreme Scale Computing
Author :
Yasin Celik;Aakash Pradeep;Justin Y. Shi
Author_Institution :
Dept. of Comput. &
fYear :
2015
Firstpage :
2010
Lastpage :
2016
Abstract :
Extreme scale computing has no implied scaling limit. The impossibility of implementing reliable communication between crashing hosts prohibits explicit-communication primitives in extreme scale applications. However, the alternatives are not well investigated. This paper reports the development and experimentation of a prototype implicit communication system called AnkaCom using a Statistic Multiplexed Computing (SMC) framework. The key concept of the SMC framework is total decoupling between application logical entities and the physical resources, such as processors, networks, databases and storages. The separation allows the integration of a SMC control layer that can automatically deliver mission critical services leveraging the maximal capabilities of the unreliable resources at runtime. The SMC control layer operates according to the principles of HDLC protocols that have been proven capable of delivering exactly once reliable services using unreliable lower level services. This paper reports the development and computational experiments of AnkaCom - an experimental SMC parallel processing system. AnkaCom is a Java implementation of a distributed tuple switch network (TSN) with in-memory replication capabilities. We study computational results of AnkaCom application against "bare metal" C/MPI application using a textbook dense matrix multiplication example. We show that granularity optimized AnkaCom application can consistently out-perform C/MPI application in a dedicated cluster of 12-120 cores. In-memory real time data replication effects are also reported to show marginal elapsed time changes with double and triple runtime tuple replicas.
Keywords :
"Peer-to-peer computing","Reliability","Computer crashes","Switches","Scalability","Logic gates","Protocols"
Publisher :
ieee
Conference_Titel :
Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/CIT/IUCC/DASC/PICOM.2015.298
Filename :
7363344
Link To Document :
بازگشت