DocumentCode
1887745
Title
Considerations for big data: Architecture and approach
Author
Bakshi, Kapil
Author_Institution
Cisco Syst. Inc., Herndon, VA, USA
fYear
2012
fDate
3-10 March 2012
Firstpage
1
Lastpage
7
Abstract
The amount of data in our industry and the world is exploding. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data (“big data”), but also to analyze and extract meaningful value from it. There are several approaches to collecting, storing, processing, and analyzing big data. The main focus of the paper is on unstructured data analysis. Unstructured data refers to information that either does not have a pre-defined data model or does not fit well into relational tables. Unstructured data is the fastest growing type of data, some example could be imagery, sensors, telemetry, video, documents, log files, and email data files. There are several techniques to address this problem space of unstructured analytics. The techniques share a common characteristics of scale-out, elasticity and high availability. MapReduce, in conjunction with the Hadoop Distributed File System (HDFS) and HBase database, as part of the Apache Hadoop project is a modern approach to analyze unstructured data. Hadoop clusters are an effective means of processing massive volumes of data, and can be improved with the right architectural approach.
Keywords
SQL; data analysis; Apache Hadoop project; HBase database; Hadoop cluster; Hadoop distributed file system; MapReduce; NoSQL; architectural approach; data management; data storage; unstructured data analysis; Availability; Benchmark testing; Computer architecture; Distributed databases; File systems; Relational databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Aerospace Conference, 2012 IEEE
Conference_Location
Big Sky, MT
ISSN
1095-323X
Print_ISBN
978-1-4577-0556-4
Type
conf
DOI
10.1109/AERO.2012.6187357
Filename
6187357
Link To Document