Title :
Sampling strategies for mining in data-scarce domains
Author :
Ramakrishnan, N. ; Bailey-Kellogg, Chris
Author_Institution :
Virginia Tech, VA, USA
Abstract :
A novel framework leverages physical properties for mining in data-scarce domains. It interleaves bottom-up data mining with top-down data collection, leading to effective and explainable sampling strategies. This article describes focused sampling strategies for mining scientific data. Our approach is based on the spatial aggregation language, which supports construction of data interpretation and control design applications for spatially distributed physical systems in a bottom-up manner. Used as a basis for describing data mining algorithms, SAL programs also help exploit knowledge of physical properties such as continuity and locality in data fields. We also introduce a top-down sampling strategy that focuses data collection in only those regions that are deemed most important to support a data mining objective.
Keywords :
data acquisition; data mining; eigenvalues and eigenfunctions; natural sciences computing; optimisation; sampling methods; data acquisition; data collection; data mining; data-scarce domains; eigenvalues; optimization; scientific data; spatial aggregation language; top-down sampling; Aerodynamics; Analytical models; Computational modeling; Data engineering; Data mining; Design engineering; Distributed computing; Process design; Propulsion; Sampling methods;
Journal_Title :
Computing in Science & Engineering
DOI :
10.1109/MCISE.2002.1014978