Title :
Data-intensive computing with map-reduce and hadoop
Author :
Humbetov, Shamil
Author_Institution :
Dept. of Comput. Eng., Qafqaz Univ., Baku, Azerbaijan
Abstract :
Every day, we create 2.5 quintillion bytes of data - so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. The IDC sizing of the digital universe - information that is either created or captured in digital form and then replicated in 2006 - is 161 Exabyte, growing to 988 Exabyte in 2010, representing a compound annual growth rate (CAGR) of 57%. A variety of system architectures have been implemented for data-intensive computing and large-scale data analysis applications including parallel and distributed relational database management systems which have been available to run on shared nothing clusters of processing nodes for more than two decades. However most data growth is with data in unstructured form and new processing paradigms with more flexible data models were needed. Several solutions have emerged including the MapReduce architecture pioneered by Google and now available in an open-source implementation called Hadoop used by Yahoo, Facebook, and others. 20% of the world´s servers go into huge data centers by the “Big 5” - Google, Microsoft, Yahoo, Amazon, eBay [1].
Keywords :
data analysis; data models; parallel databases; relational databases; sensors; CAGR; Hadoop; MapReduce architecture; cell phone GPS signals; climate information; compound annual growth rate; data-intensive computing; digital pictures; digital universe IDC sizing; distributed relational database management systems; flexible data models; large-scale data analysis applications; open-source implementation; parallel relational database management systems; processing nodes; processing paradigms; purchase transaction records; sensors; shared nothing clusters; social media sites; transaction records; Computational modeling; Computers; Data processing; Distributed databases; File systems; Google; Servers; Data Intensive Computing; Hadoop; MapReduce;
Conference_Titel :
Application of Information and Communication Technologies (AICT), 2012 6th International Conference on
Conference_Location :
Tbilisi
Print_ISBN :
978-1-4673-1739-9
DOI :
10.1109/ICAICT.2012.6398489