DocumentCode
3717361
Title
An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets
Author
Fatih Akdag;Christoph F. Eick
Author_Institution
Computer Science Department, University of Houston
fYear
2015
Firstpage
2010
Lastpage
2019
Abstract
We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert´s notion of interestingness captured by an interestingness function. This paper centers on finding interestingness hotspots on very large gridded datasets which are quite common in scientific computing. Mining large gridded datasets with a lot of variables and measurements requires a scalable framework that can process large amounts of data in an efficient way. In our recent work, we proposed a computational framework which discovers interestingness hotspots in gridded datasets using a 3-step approach which consists of seeding, hotspot growing and post-processing steps. In this paper, we significantly improve the efficiency of the framework by utilizing parallel processing and employing more efficient data structures and algorithms. We propose a novel heap-based hotspot growing algorithm which brings down the cost of hotspot growing phase significantly. In addition, we propose a graph-based preprocessing algorithm which decreases the number of hotspots grown by merging some hotspot seeds. Other improvements to the framework involve incremental calculation of interestingness functions, and growing hotspots in parallel. The improved framework is evaluated in a case study for a very large 4-dimensional gridded air pollution dataset in which we find interestingness hotspots with respect to pollutants.
Keywords
"Algorithm design and analysis","Atmospheric measurements","Merging","Clustering algorithms","Complexity theory","Pollution measurement","Runtime"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363982
Filename
7363982
Link To Document