DocumentCode :
652358
Title :
Detecting Associations in Large Dataset on MapReduce
Author :
Dong Dai ; Xi Li ; Chao Wang ; Junneng Zhang ; Xuehai Zhou
Author_Institution :
Comput. Sci. Coll., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
1788
Lastpage :
1794
Abstract :
In daily life, we are surrounded by all kinds of data. How to find the relationship between these data has become one of the most challenges before the data scientists. In 2011, David N. Reshef etc. took a great leap on solving this problem. They has proved that maximal information coefficient(mic) is an effective tool to detect different kinds of relationships between any given variable pairs no matter these relationships are functional or not. However, challenges remained because the computation procedure is too complex and time-consuming for large dataset and make this algorithm not possible to work in reality. In this paper, we explore the possible parallel ways to detect the associations between variables in large dataset, and propose a high performance MapReduce based solution, which includes data storage pattern, preprocessing algorithms, distributed memory cache mechanism, and a serial of MapReduce jobs. The experiments show that our parallel solution provide a linear speedup comparing with original algorithm without affecting the correctness. The work done in this paper makes the famous mic algorithm more practical in solving real problem.
Keywords :
data handling; parallel processing; MapReduce based solution; MapReduce jobs; association detection; data storage pattern; distributed memory cache mechanism; large dataset; maximal information coefficient; mic algorithm; parallel solution; preprocessing algorithms; Complexity theory; Microwave integrated circuits; Mutual information; Parallel algorithms; Partitioning algorithms; Servers; Vectors; Associations; Distributed Algorithm; MapReduce; information theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.222
Filename :
6681053
Link To Document :
بازگشت