مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimize Parallel Data Access in Big Data Processing

DocumentCode :

3078203

Title :

Optimize Parallel Data Access in Big Data Processing

Author :

Jiangling Yin ; Jun Wang

Author_Institution :

EECS Dept., Univ. of Central Florida, Orlando, FL, USA

fYear :

2015

fDate :

4-7 May 2015

Firstpage :

721

Lastpage :

724

Abstract :

Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many parallel big data processing systems, such as graph processing systems, MPI-based parallel programs and scala/java-based Spark frameworks, which can efficiently support iterative and interactive data analysis in memory. The first part of my dissertation mainly focuses on studying parallel data accession distributed file systems, e.g, HDFS. Since the distributed I/O resources and global data distribution are often not taken into consideration, the data requests from parallel processes/executors will unfortunately be served in a remoter imbalanced fashion on the storage servers. In order to address these problems, we develop I/O middleware systems and matching-based algorithms to map parallel data requests to storage servers such that local and balanced data access can be achieved. The last part of my dissertation presents our plans to improve the performance of interactive data access in big data analysis. Specifically, most interactive analysis programs will scan through the entire data set regardless of which data is actually required. We plan to develop a content-aware method to quickly access required data without this laborious scanning process.

Keywords :

Big Data; Java; data analysis; input-output programs; interactive systems; middleware; parallel processing; HDFS; Hadoop distributed file system; IO middleware systems; MPI-based parallel programs; balanced data access; big data analysis; big data processing; content-aware method; distributed IO resources; global data distribution; graph processing systems; interactive data analysis; iterative data analysis; laborious scanning process; local data access; matching-based algorithms; optimize parallel data access; parallel big data processing systems; parallel data requests; scala-java-based Spark frameworks; storage servers; Bandwidth; Big data; Data visualization; Distributed databases; Middleware; Simultaneous localization and mapping;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on

Conference_Location :

Shenzhen

Type :

conf

DOI :

10.1109/CCGrid.2015.168

Filename :

7152541

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3078203