Title :
The dawn of Big Data - Hbase
Author :
Bhupathiraju, Vijayalakshmi ; Ravuri, Ravi Prasad
Author_Institution :
Dept. of MCA, Padmasri Dr. B.V. Raju Inst. of Tehnology, Hyderabad, India
Abstract :
HBase is a distributed column-oriented database built on top of HDFS. HBase is the Hadoop application to use when you require real-time read/write random access to very large datasets. HBase is a scalable data store targeted at random read and write access of (fairly-) structured data. It´s modeled after Google´s Big table and targeted to support large tables, on the order of billions of rows and millions of columns. It uses HDFS as the underlying file system and is designed to be fully distributed and highly available. Version 0.20 introduces significant performance improvement. Base´s Table Input Format is designed to allow a Map Reduce program to operate on data stored in an HBase table. Table Output Format is for writing Map Reduce outputs into an HBase table. HBase has different storage characteristics than HDFS, such as the ability to do row updates and column indexing, so we can expect to see these features used by Hive in future releases. It is already possible to access HBase tables from Hive. This paper includes the step by step introduction to the HBase, Identify differences between apache HBase and a traditional RDBMS, The Problem with Relational Database Systems, Relation between the Hadoop and HBase, How an Apache HBase table is physically stored on disk. Later part of this paper introduces Map Reduce, HBase table and how Apache HBase Cells stores data, what happens to data when it is deleted. Last part explains difference between Big Data and HBase, Conclusion followed with the References.
Keywords :
Big Data; data structures; distributed databases; Google big table; HBase table; HDFS; Hadoop application; Hive; Map Reduce outputs; Map Reduce program; big data; column indexing; distributed column-oriented database; random read-write access; read-write random access; row updates; scalable data store; structured data; table input format; table output format; Compaction; Electronic publishing; Encyclopedias; HTML; Internet; Monitoring; HBase; HBase column oriented table; Hadoop Distributed File System (HDFS);
Conference_Titel :
IT in Business, Industry and Government (CSIBIG), 2014 Conference on
Print_ISBN :
978-1-4799-3063-0
DOI :
10.1109/CSIBIG.2014.7056952