DocumentCode :
3475348
Title :
Improving network performance through task duplication for parallel applications on clusters
Author :
Qin, Xiao
Author_Institution :
Dept. of Comput. Sci., New Mexico Inst. of Min. & Technol., Socorro, NM, USA
fYear :
2005
fDate :
7-9 April 2005
Firstpage :
35
Lastpage :
42
Abstract :
While data replication is widely used in clusters to provide fault tolerance, it can heavily stress communication networks and degrade overall performance of parallel applications. The performance degradation is particularly unacceptable with disk-write-intensive applications. As a result, data duplication management for parallel applications running on clusters is a significant and urgent challenge. This paper presents the design, implementation, and evaluation of a network-aware task duplication management system, or TUFF, where redundant data can be regenerated by corresponding duplicate tasks rather than directly replicating through networks. In addition, TUFF is capable of improving availability performance of parallel applications, because TUFF allows two replicas of each I/O-intensive task to be executed on two different nodes. We have implemented and evaluated TUFF using extensive simulations under a diverse set of workload conditions. Experimental results show that TUFF improves the overall performance of parallel applications running on clusters by efficiently reducing network resource consumption.
Keywords :
computer network management; computer network reliability; fault tolerant computing; parallel processing; performance evaluation; workstation clusters; I-O-intensive task; TUFF; availability; data duplication management; disk-write-intensive application; fault tolerance; network clusters; parallel applications; performance degradation; stress communication networks; Application software; Communication networks; Computer science; Costs; Degradation; Fault tolerance; Fault tolerant systems; File systems; Middleware; Redundancy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International
ISSN :
1097-2641
Print_ISBN :
0-7803-8991-3
Type :
conf
DOI :
10.1109/PCCC.2005.1460511
Filename :
1460511
Link To Document :
بازگشت