Handling Big Data Efficiently by Using Map Reduce Technique

Author

Maitrey, Seema ; Jha, C.K.

fYear

2015

fDate

13-14 Feb. 2015

Firstpage

703

Lastpage

708

Abstract

Extremely large amount of data is being captured by today´s organizations and is continue to increase. It becomes computationally inefficient to analyze such huge data. Researchers has addressed problem in discovering knowledge from these continuously growing large data sets. Quantity of available raw data has been increasing at a very high rate. The precious information is concealed in large databases. Data mining has become an interesting area to extract the embedded precious information from them. For many years it has been found its root in all kinds of application areas. Thus, gave evolution to many data mining methods which started to get applied in several real life fields. But not all the methods possess the capability to deal with and handle the huge collection of data. In recent years, numbers of computation and data intensive scientific data analyses are established. To perform the large scale data mining analyses so as to meet the scalability and performance requirements of big data, several efficient parallel and concurrent algorithms got applied. A lot of parallel algorithms are put into action using different parallelization techniques, such as-threads, MPI, MapReduce etc. Which yield different performance and usability characteristics. The MPI model works efficiently in computing rigorous problems but it is a complicated task to bring this model into the practical use. There is currently considerable enthusiasm around the MapReduce paradigm for large-scale data analysis. It is inspired by functional programming which allows expressing distributed computations on massive amounts of data. It is designed for large-scale data processing as it allows to run on clusters of commodity hardware. A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. In this paper, we are going to work around MapReduce, its advantages, disadvantages and how it can be - sed in integration with other technology.

Keywords

Big Data; concurrency control; data analysis; data mining; functional programming; parallel algorithms; MPI model; MapReduce paradigm; MapReduce technique; big data handling; concurrent algorithms; data intensive scientific data analysis; data mining methods; distributed computations; functional programming; knowledge discovery; large scale data mining analyses; large-scale data analysis; large-scale data processing; parallel algorithms; parallel data processing tool; parallelization techniques; usability characteristics; Big data; Data mining; Fault tolerance; Fault tolerant systems; Google; Radiation detectors; Clustering; DBMS; Data Mining; Hadoop; MapReduce; Parallel processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on

Conference_Location

Ghaziabad

Print_ISBN

978-1-4799-6022-4

Type

conf

DOI

10.1109/CICT.2015.140

Filename

7078794

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=699014