Title :
Correlation Aware Technique for SQL to NoSQL Transformation
Author :
Jen-Chun Hsu ; Ching-Hsien Hsu ; Shih-Chang Chen ; Yeh-Ching Chung
Author_Institution :
Dept. Comput. Sci. & Inf. Eng., Chung Hua Univ., Hsinchu, Taiwan
Abstract :
For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.
Keywords :
SQL; parallel processing; public domain software; Apache Hadoop; Apache Sqoop concept; CA_Sqoop; NoSQL transformation; SQL transformation; correlation aware technique; data locality; data nodes; data placement mechanism; data transformation cost reduction; database performance; distributed computing; general data analysis; parallel computing; query efficiency; Cloud computing; Computer architecture; Correlation; Data processing; Distributed databases; File systems; Big Data; Cloud computing; Data locality; NoSQL; Sqoop;
Conference_Titel :
Ubi-Media Computing and Workshops (UMEDIA), 2014 7th International Conference on
Conference_Location :
Ulaanbaatar
Print_ISBN :
978-1-4799-4267-1
DOI :
10.1109/U-MEDIA.2014.27