DocumentCode :
1787463
Title :
Development of a Semi-synthetic Dataset as a Testbed for Big-Data Semantic Analytics
Author :
Techentin, Robert ; Foti, Dora ; Li, Peng ; Daniel, E. ; Gilbert, Barry ; Holmes, David ; Al-Saffar, Sinan
Author_Institution :
Mayo Clinic, Rochester, MN, USA
fYear :
2014
fDate :
16-18 June 2014
Firstpage :
252
Lastpage :
253
Abstract :
We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.
Keywords :
Big Data; data analysis; medical information systems; relational databases; very large databases; big-data semantic analytics; data augmentation; data collection; data consolidation; data.gov data repository; medical institution; medical record; multivariate data augmentation strategy; semisynthetic dataset development; Benchmark testing; Complexity theory; Conferences; Distributed databases; Resource description framework; Semantics; RDF; big data; data.gov; graph computing; semantic representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2014 IEEE International Conference on
Conference_Location :
Newport Beach, CA
Print_ISBN :
978-1-4799-4002-8
Type :
conf
DOI :
10.1109/ICSC.2014.45
Filename :
6882033
Link To Document :
بازگشت