DocumentCode :
167651
Title :
Model-Driven Data Layout Selection for Improving Read Performance
Author :
Jialin Liu ; Byna, Surendra ; Bin Dong ; Kesheng Wu ; Yong Chen
Author_Institution :
Dept. of Comput. Sci., Texas Tech Univ., Lubbock, TX, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1708
Lastpage :
1716
Abstract :
Performance of reading scientific data from a parallel file system depends on the organization of data on physical storage devices. Data is often immutable after producers of data, such as large-scale simulations, experiments, and observations, write the data to the parallel file system. As a result, read performance of data analysis tasks is often slow when the read pattern does not conform with the original organization of the data. For example, reading small noncontiguous chunks of data from a large array is many times slower than reading the same size of contiguous chunks of data. Towards improving the data read performance during analysis phase, we are developing the Scientific Data Services (SDS) framework for automatically reorganizing previously written data to conform with the known read patterns. In this paper, we introduce a model-driven strategy for selecting the data layouts that benefit the performance of different read patterns. We have developed a parallel I/O model based on the striping parameters on Lustre file system and the block-level striping on RAID-based disks within an Object Storage Target (OST) of Lustre. We have applied the model to reorganize large 3D array datasets on a Cray XE6 platform and achieved 9X to 128X improvement in accessing the reorganized data compared to reading the data in its original layout.
Keywords :
data analysis; input-output programs; parallel processing; storage management; Cray XE6 platform; Lustre file system; OST; RAID-based disks; SDS framework; block-level striping; data analysis; model-driven data layout selection; object storage target; parallel I/O model; parallel file system; physical storage devices; read performance; scientific data reading; scientific data services framework; Arrays; Computational modeling; Data models; Distributed databases; Layout; Organizations; Predictive models; Big Data; I/O Performance Model; Scientific Data Management; Scientific Services (SDS); high performance computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.190
Filename :
6969581
Link To Document :
بازگشت