• DocumentCode
    3717361
  • Title

    An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets

  • Author

    Fatih Akdag;Christoph F. Eick

  • Author_Institution
    Computer Science Department, University of Houston
  • fYear
    2015
  • Firstpage
    2010
  • Lastpage
    2019
  • Abstract
    We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert´s notion of interestingness captured by an interestingness function. This paper centers on finding interestingness hotspots on very large gridded datasets which are quite common in scientific computing. Mining large gridded datasets with a lot of variables and measurements requires a scalable framework that can process large amounts of data in an efficient way. In our recent work, we proposed a computational framework which discovers interestingness hotspots in gridded datasets using a 3-step approach which consists of seeding, hotspot growing and post-processing steps. In this paper, we significantly improve the efficiency of the framework by utilizing parallel processing and employing more efficient data structures and algorithms. We propose a novel heap-based hotspot growing algorithm which brings down the cost of hotspot growing phase significantly. In addition, we propose a graph-based preprocessing algorithm which decreases the number of hotspots grown by merging some hotspot seeds. Other improvements to the framework involve incremental calculation of interestingness functions, and growing hotspots in parallel. The improved framework is evaluated in a case study for a very large 4-dimensional gridded air pollution dataset in which we find interestingness hotspots with respect to pollutants.
  • Keywords
    "Algorithm design and analysis","Atmospheric measurements","Merging","Clustering algorithms","Complexity theory","Pollution measurement","Runtime"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363982
  • Filename
    7363982