DocumentCode
117246
Title
Big data dimensional analysis
Author
Gadepally, Vijay ; Kepner, Jeremy
Author_Institution
MIT Lincoln Lab., Lexington, MA, USA
fYear
2014
fDate
9-11 Sept. 2014
Firstpage
1
Lastpage
6
Abstract
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements.
Keywords
Big Data; data analysis; data structures; database management systems; ontologies (artificial intelligence); pattern recognition; Big Data; DDA; data patterns; data structure; database ontology design; dimensional data analysis; innovative tools; statical anomalies; Algorithm design and analysis; Arrays; Big data; Distributed databases; Nickel; Big Data; Data Analytics; Dimensional Analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Extreme Computing Conference (HPEC), 2014 IEEE
Conference_Location
Waltham, MA
Print_ISBN
978-1-4799-6232-7
Type
conf
DOI
10.1109/HPEC.2014.7040944
Filename
7040944
Link To Document