Considerations for big data: Architecture and approach

Author

Bakshi, Kapil

Author_Institution

Cisco Syst. Inc., Herndon, VA, USA

fYear

2012

fDate

3-10 March 2012

Firstpage

1

Lastpage

7

Abstract

The amount of data in our industry and the world is exploding. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data (“big data”), but also to analyze and extract meaningful value from it. There are several approaches to collecting, storing, processing, and analyzing big data. The main focus of the paper is on unstructured data analysis. Unstructured data refers to information that either does not have a pre-defined data model or does not fit well into relational tables. Unstructured data is the fastest growing type of data, some example could be imagery, sensors, telemetry, video, documents, log files, and email data files. There are several techniques to address this problem space of unstructured analytics. The techniques share a common characteristics of scale-out, elasticity and high availability. MapReduce, in conjunction with the Hadoop Distributed File System (HDFS) and HBase database, as part of the Apache Hadoop project is a modern approach to analyze unstructured data. Hadoop clusters are an effective means of processing massive volumes of data, and can be improved with the right architectural approach.

Keywords

SQL; data analysis; Apache Hadoop project; HBase database; Hadoop cluster; Hadoop distributed file system; MapReduce; NoSQL; architectural approach; data management; data storage; unstructured data analysis; Availability; Benchmark testing; Computer architecture; Distributed databases; File systems; Relational databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Aerospace Conference, 2012 IEEE

Conference_Location

Big Sky, MT

ISSN

1095-323X

Print_ISBN

978-1-4577-0556-4

Type

conf

DOI

10.1109/AERO.2012.6187357

Filename

6187357