Title :
High performance and grid computing with quality of service control
Author :
Sait, S.M. ; Al-Shaikh, Raed
Author_Institution :
Center for Commun. & IT Res., King Fahd Univ. of Pet. & Miner., Dhahran, Saudi Arabia
fDate :
June 30 2014-July 2 2014
Abstract :
Up to writing this paper, existing High Performance Computing (HPC) systems do not provide proper quality of service (QoS) controls and reliability features because of two limitations: first, standard middleware libraries such as Message Passing Interface (MPI) and Parallel Virtual Machine (PVM) do not provide means for applications to specify service quality for computation and communication. Second, modern high-speed interconnects such as Infiniband, Myrinet and Quadrics are optimized for performance rather than fault-tolerance and QoS control. The Data-Centric Publish-Subscribe (DCPS) model - the core of Data Distribution Service (DDS) systems - defines standards that enable applications running on heterogeneous platforms to control various QoS policies in a net-centric system. In this paper, we present our novel model of incorporating DDS QoS and reliability controls into HPC systems. Our results show that DDS integration into HPC adds considerable overheard in terms of performance and network utilization, when the application is mainly communication.
Keywords :
fault tolerance; grid computing; middleware; parallel processing; quality of service; DCPS model; DDS QoS; DDS integration; DDS systems; HPC systems; MPI; Myrinet; PVM; QoS controls; Quadrics; data distribution service systems; data-centric publish-subscribe model; fault-tolerance control; grid computing; high performance computing systems; message passing interface; net-centric system; parallel virtual machine; quality-of-service control; reliability features; service quality; standard middleware libraries; Computational modeling; Hardware; Middleware; Quality of service; Reliability; Scalability; Standards; HPC; MPI; Middleware; QoS;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on
Conference_Location :
Las Vegas, NV
DOI :
10.1109/SNPD.2014.6888711