DocumentCode :
3200086
Title :
Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems
Author :
Jiangling Yin ; Jun Wang ; Jian Zhou ; Lukasiewicz, Tyler ; Dan Huang ; Junyao Zhang
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
623
Lastpage :
632
Abstract :
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster tested and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
Keywords :
data structures; distributed databases; graph theory; pattern clustering; Hadoop file system; Opass; PRObEs Marmot 128-node cluster testbed; cluster-storage nodes; data locality; data transfer contention; disk access; distributed file systems; graph data structure; load capacity; matching-based algorithms; parallel applications; parallel data access; parallel data access strategies; parallel data read requests; parallel data-intensive analysis; Computational modeling; Data analysis; Distributed databases; File systems; Optimization; Parallel processing; Bipartite Matching; Distributed File Systems; Parallel Data Access;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.55
Filename :
7161550
Link To Document :
بازگشت