• DocumentCode
    246318
  • Title

    A Framework for Managing Continuous Query Evaluations over Voluminous, Multidimensional Datasets

  • Author

    Tolooee, Cameron ; Malensek, Matthew ; Pallickara, Sangmi Lee

  • Author_Institution
    Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    73
  • Lastpage
    82
  • Abstract
    Efficient access to voluminous multidimensional datasets is essential for several scientific applications, including real-time analysis and visualization. Fast evolving datasets present unique challenges during retrievals. Keeping data up-to-date can be expensive and may involve the following: repeated data queries, excessive data movements, and redundant data preprocessing. This paper focuses on the issue of efficient manipulation of query results in cases where the dataset is continuously evolving. Our approach provides an automated and scalable tracking and caching mechanism to evaluate continuous queries over data stored in a distributed storage system. Among the storage nodes, one or more nodes are selected using an election algorithm based on CPU and memory utilization. These selected nodes ensure that the query output contains the most recent data arrivals and cache the metadata of the query output. This approach is evaluated in the context of Galileo, our distributed data storage framework. Galileo is designed for managing multidimensional time-series datasets generated in geospatial observational settings, e.g. Data generated by remote sensing equipment and sensor networks. We describe our approach of using the metadata graph to push data preprocessing jobs onto the storage system during the continuous query processing and selectively download subsets of the query output. Our performance benchmarks demonstrate the efficacy of our approach.
  • Keywords
    cache storage; data analysis; query processing; storage management; time series; CPU; Galileo; automated tracking mechanism; caching mechanism; continuous query evaluation management; continuous query processing; distributed data storage framework; election algorithm; excessive data movements; geospatial observational settings; memory utilization; metadata; metadata graph; multidimensional time-series datasets; redundant data preprocessing; remote sensing equipment; repeated data queries; scalable tracking mechanism; sensor networks; storage nodes; voluminous multidimensional datasets; Distributed databases; Filtering algorithms; Geospatial analysis; Humidity; Nominations and elections; Query processing; Real-time systems; Continuous Query; Galileo; Query caching; Time series data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Autonomic Computing (ICCAC), 2014 International Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/ICCAC.2014.25
  • Filename
    7024047