مرکز منطقه ای اطلاع رساني علوم و فناوري - Model-Driven Data Layout Selection for Improving Read Performance

DocumentCode :

167651

Title :

Model-Driven Data Layout Selection for Improving Read Performance

Author :

Jialin Liu ; Byna, Surendra ; Bin Dong ; Kesheng Wu ; Yong Chen

Author_Institution :

Dept. of Comput. Sci., Texas Tech Univ., Lubbock, TX, USA

fYear :

2014

fDate :

19-23 May 2014

Firstpage :

1708

Lastpage :

1716

Abstract :

Performance of reading scientific data from a parallel file system depends on the organization of data on physical storage devices. Data is often immutable after producers of data, such as large-scale simulations, experiments, and observations, write the data to the parallel file system. As a result, read performance of data analysis tasks is often slow when the read pattern does not conform with the original organization of the data. For example, reading small noncontiguous chunks of data from a large array is many times slower than reading the same size of contiguous chunks of data. Towards improving the data read performance during analysis phase, we are developing the Scientific Data Services (SDS) framework for automatically reorganizing previously written data to conform with the known read patterns. In this paper, we introduce a model-driven strategy for selecting the data layouts that benefit the performance of different read patterns. We have developed a parallel I/O model based on the striping parameters on Lustre file system and the block-level striping on RAID-based disks within an Object Storage Target (OST) of Lustre. We have applied the model to reorganize large 3D array datasets on a Cray XE6 platform and achieved 9X to 128X improvement in accessing the reorganized data compared to reading the data in its original layout.

Keywords :

data analysis; input-output programs; parallel processing; storage management; Cray XE6 platform; Lustre file system; OST; RAID-based disks; SDS framework; block-level striping; data analysis; model-driven data layout selection; object storage target; parallel I/O model; parallel file system; physical storage devices; read performance; scientific data reading; scientific data services framework; Arrays; Computational modeling; Data models; Distributed databases; Layout; Organizations; Predictive models; Big Data; I/O Performance Model; Scientific Data Management; Scientific Services (SDS); high performance computing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International

Conference_Location :

Phoenix, AZ

Print_ISBN :

978-1-4799-4117-9

Type :

conf

DOI :

10.1109/IPDPSW.2014.190

Filename :

6969581

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=167651