Title :
Toward an Ecosystem for Precision Sharing of Segmented Big Data
Author :
Shtern, Mark ; Simmons, Bradley ; Smit, Meint ; Litoiu, Marin
Author_Institution :
York Univ., Toronto, ON, Canada
fDate :
June 28 2013-July 3 2013
Abstract :
As the amount of data created and stored by organizations continues to increase, attention is turning to extracting knowledge from that raw data, including making some data available outside of the organization to enable crowd analytics. The adoption of the MapReduce paradigm has made processing Big Data more accessible, but is still limited to data that is currently available, often only within an organization. Fine-grained control over what information is shared outside an organization is difficult to achieve with Big Data, particularly in the MapReduce model. We introduce a novel approach to sharing that enables fine-grained control over what data is shared. Users submit analytics tasks that run on infrastructure near the actual data, reducing network bottlenecks. Organizations allow access to a logical version of their data created at runtime by filtering and transforming the actual data without creating storage-intensive stale copies, and resellers can further segment or augment this data to provide added value to analytics tasks. A loosely-coupled ecosystem driven by web services allows for discovery and sharing with a flexible, secure environment that limits the knowledge those running analytics need to have about the actual provider of the data. We describe a proof-of-concept implementation of the various components required to realize this ecosystem, and present a set of experiments to demonstrate feasibility, showing advantageous performance versus storage trade-offs.
Keywords :
Web services; data analysis; Big Data segmentation; MapReduce paradigm; Web services; crowd analytics; fine-grained control; knowledge extraction; loosely-coupled ecosystem; network bottleneck reduction; precision sharing; raw data; Access control; Data handling; Data storage systems; Ecosystems; Information management; Organizations; Web services; MapReduce; access control; big data; cloud; hadoop;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.131