DocumentCode :
172872
Title :
Multiple Two-Phase Data Processing with MapReduce
Author :
Hsiang Huang Wu ; Tse Chen Yeh ; Chien Min Wang
Author_Institution :
Inst. of Inf. Sci., Taipei, Taiwan
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
352
Lastpage :
359
Abstract :
MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.
Keywords :
parallel programming; video surveillance; MapReduce programming model; background subtraction; design methodology; distributed resources; independent input key-value pair processing; independent set processing; large-scale data processing; multiple two-phase data processing structure; two-phase data processing pattern design; video surveillance; Big data; Cloud computing; Computational modeling; Data models; Parallel processing; Sorting; Big data; Cloud computing; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
Type :
conf
DOI :
10.1109/CLOUD.2014.55
Filename :
6973761
Link To Document :
بازگشت