Title :
A big data implementation based on Grid computing
Author :
Garlasu, D. ; Sandulescu, V. ; Halcu, Ionela ; Neculoiu, Giorgian ; Grigoriu, O. ; Marinescu, Mariana ; Marinescu, Virgil
Author_Institution :
Core Technol. Oracle Romania, Bucharest, Romania
Abstract :
Big Data is a term defining data that has three main characteristics. First, it involves a great volume of data. Second, the data cannot be structured into regular database tables and third, the data is produced with great velocity and must be captured and processed rapidly. Oracle adds a fourth characteristic for this kind of data and that is low value density, meaning that sometimes there is a very big volume of data to process before finding valuable needed information. Big Data is a relatively new term that came from the need of big companies like Yahoo, Google, Facebook to analyze big amounts of unstructured data, but this need could be identified in a number of other big enterprises as well in the research and development field. The framework for processing Big Data consists of a number of software tools that will be presented in the paper, and briefly listed here. There is Hadoop, an open source platform that consists of the Hadoop kernel, Hadoop Distributed File System (HDFS), MapReduce and several related instruments. Two of the main problems that occur when studying Big Data are the storage capacity and the processing power. That is the area where using Grid Technologies can provide help. Grid Computing refers to a special kind of distributed computing. A Grid computing system must contain a Computing Element (CE), and a number of Storage Elements (SE) and Worker Nodes (WN). The CE provides the connection with other GRID networks and uses a Workload Management System to dispatch jobs on the Worker Nodes. The Storage Element is in charge with the storage of the input and the output of the data needed for the job execution. The main purpose of this article is to present a way of processing Big Data using Grid Technologies. For that, the framework for managing Big Data will be presented along with the way to implement it around a grid architecture.
Keywords :
distributed databases; grid computing; public domain software; research and development; software architecture; storage allocation; CE; GRID networks; HDFS; Hadoop distributed file system; Hadoop kernel; Hadoop open source platform; MapReduce framework; SE; WN; Worker Nodes; big data implementation; big-enterprises; computing element; data volume; distributed computing; grid architecture; grid computing; job dispatching; job execution; processing power; research and development field; software tools; storage capacity; storage elements; unstructured data analysis; worker nodes; workload management system; Big data; Conferences; Databases; File systems; Google; Grid computing; Servers; Big Data; Grid Technology; HDFS; Hadoop; Storage Element;
Conference_Titel :
Roedunet International Conference (RoEduNet), 2013 11th
Conference_Location :
Sinaia
Print_ISBN :
978-1-4673-6114-9
DOI :
10.1109/RoEduNet.2013.6511732