DocumentCode
1787463
Title
Development of a Semi-synthetic Dataset as a Testbed for Big-Data Semantic Analytics
Author
Techentin, Robert ; Foti, Dora ; Li, Peng ; Daniel, E. ; Gilbert, Barry ; Holmes, David ; Al-Saffar, Sinan
Author_Institution
Mayo Clinic, Rochester, MN, USA
fYear
2014
fDate
16-18 June 2014
Firstpage
252
Lastpage
253
Abstract
We have developed a large semi-synthetic, semantically rich dataset, modeled after the medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The construction process and basic data characterization are described. The databases, as well as code for data collection, consolidation, and augmentation are available for distribution.
Keywords
Big Data; data analysis; medical information systems; relational databases; very large databases; big-data semantic analytics; data augmentation; data collection; data consolidation; data.gov data repository; medical institution; medical record; multivariate data augmentation strategy; semisynthetic dataset development; Benchmark testing; Complexity theory; Conferences; Distributed databases; Resource description framework; Semantics; RDF; big data; data.gov; graph computing; semantic representation;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing (ICSC), 2014 IEEE International Conference on
Conference_Location
Newport Beach, CA
Print_ISBN
978-1-4799-4002-8
Type
conf
DOI
10.1109/ICSC.2014.45
Filename
6882033
Link To Document