مرکز منطقه ای اطلاع رساني علوم و فناوري - Data-intensive computing with map-reduce and hadoop

DocumentCode :

585849

Title :

Data-intensive computing with map-reduce and hadoop

Author :

Humbetov, Shamil

Author_Institution :

Dept. of Comput. Eng., Qafqaz Univ., Baku, Azerbaijan

fYear :

2012

fDate :

17-19 Oct. 2012

Firstpage :

Lastpage :

Abstract :

Every day, we create 2.5 quintillion bytes of data - so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. The IDC sizing of the digital universe - information that is either created or captured in digital form and then replicated in 2006 - is 161 Exabyte, growing to 988 Exabyte in 2010, representing a compound annual growth rate (CAGR) of 57%. A variety of system architectures have been implemented for data-intensive computing and large-scale data analysis applications including parallel and distributed relational database management systems which have been available to run on shared nothing clusters of processing nodes for more than two decades. However most data growth is with data in unstructured form and new processing paradigms with more flexible data models were needed. Several solutions have emerged including the MapReduce architecture pioneered by Google and now available in an open-source implementation called Hadoop used by Yahoo, Facebook, and others. 20% of the world´s servers go into huge data centers by the “Big 5” - Google, Microsoft, Yahoo, Amazon, eBay [1].

Keywords :

data analysis; data models; parallel databases; relational databases; sensors; CAGR; Hadoop; MapReduce architecture; cell phone GPS signals; climate information; compound annual growth rate; data-intensive computing; digital pictures; digital universe IDC sizing; distributed relational database management systems; flexible data models; large-scale data analysis applications; open-source implementation; parallel relational database management systems; processing nodes; processing paradigms; purchase transaction records; sensors; shared nothing clusters; social media sites; transaction records; Computational modeling; Computers; Data processing; Distributed databases; File systems; Google; Servers; Data Intensive Computing; Hadoop; MapReduce;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Application of Information and Communication Technologies (AICT), 2012 6th International Conference on

Conference_Location :

Tbilisi

Print_ISBN :

978-1-4673-1739-9

Type :

conf

DOI :

10.1109/ICAICT.2012.6398489

Filename :

6398489

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=585849