Title :
Scientific discovery through weighted sampling
Author :
Sidirourgos, Lefteris ; Kersten, M. ; Boncz, P.
Author_Institution :
Database Archit., CWI, Amsterdam, Netherlands
Abstract :
Scientific discovery has shifted from being an exercise of theory and computation, to become the exploration of an ocean of observational data. Scientists explore data originated from modern scientific instruments in order to discover interesting aspects of it and formulate their hypothesis. Such workloads press for new database functionality. We aim at sampling scientific databases to create many different impressions of the data, on which the scientists can quickly evaluate exploratory queries. However, scientific databases introduce different challenges for sample construction compared to classical business analytical applications. We propose adaptive weighted sampling as an alternative to uniform sampling. With weighted sampling only the most informative data is being sampled, thus more relevant data to the scientific discovery is available to examine a hypothesis. Relevant data is considered to be the focal points of the scientific search, and can be defined either a priori with the use of functions, or by monitoring the query workload. We study such query workloads, and we detail different families of weight functions. Finally, we give a quantitative and qualitative evaluation of weighted sampling.
Keywords :
database management systems; query processing; sampling methods; scientific information systems; adaptive weighted sampling; business analytical applications; database functionality; exploratory queries; informative data; observational data; query workload monitoring; scientific databases; scientific discovery; scientific instruments; scientific search; weight functions; Astronomy; Business; Earthquakes; Histograms; Image color analysis; Relational databases; Sampling; Scientific Data Management;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691587