A Time Based Analysis of Data Processing on Hadoop Cluster

Author

Pal, Amrit ; Agrawal, Sanjay

Author_Institution

Dept. of Comput. Eng. & Applic., Nat. Inst. of Tech. Teachers´ Training & Res. Bhopal, Bhopal, India

fYear

2014

Firstpage

608

Lastpage

612

Abstract

Data when it becomes in that much amount that it cannot be managed by the traditional database management system then it is Big data. It is difficult to manage this much amount of the data. Hadoop is a technological answer to the Big Data. Data storage and retrieval of information from the data is done by the Hadoop Distributed File System and the Map Reduce Programming model. MapReduce provides effective bench marks for retrieving the information from the Big Data. In this paper we present our experimental work done on the Hadoop Cluster. We have analyzed the time required by the cluster for processing the data with increasing number of nodes into the cluster. We started with a single node and then increase the node by one each time. We have analyzed three types of time. The real time, user time, system time is analyzed.

Keywords

Big Data; information retrieval; storage management; Big Data; Hadoop cluster; Hadoop distributed file system; MapReduce programming model; data processing; data storage; information retrieval; real time; system time; time based analysis; user time; Big data; Distributed databases; File systems; Google; Real-time systems; Sorting; Data Node; Hadoop Distributed File System; Job Tracker; MapReduce; Name Node; Task Tracker;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence and Communication Networks (CICN), 2014 International Conference on

Print_ISBN

978-1-4799-6928-9

Type

conf

DOI

10.1109/CICN.2014.136

Filename

7065556