• DocumentCode
    2484212
  • Title

    A partition-based approach to support streaming updates over persistent data in an active datawarehouse

  • Author

    Chakraborty, Abhirup ; Singh, Ajit

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2009
  • fDate
    23-29 May 2009
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    Active warehousing has emerged in order to meet the high user demands for fresh and up-to-date information. Online refreshment of the source updates introduces processing and disk overheads in the implementation of the warehouse transformations. This paper considers a frequently occurring operator in active warehousing which computes the join between a fast, time varying or bursty update stream S and a persistent disk relation R, using a limited memory. Such a join operation is the crux of a number of common transformations (e.g., surrogate key assignment, duplicate detection etc) in an active data warehouse. We propose a partition-based join algorithm that minimizes the processing overhead, disk overhead and the delay in output tuples. The proposed algorithm exploits the spatio-temporal locality within the update stream, and improves the delays in output tuples by exploiting hot-spots in the range or domain of the joining attributes, and at the same time shares the I/O cost of accessing disk data of relation R over a volume of tuples from update stream S. We present experimental results showing the effectiveness of the proposed algorithm.
  • Keywords
    active databases; data handling; data warehouses; active data warehouse; active warehousing; bursty update stream; disk overhead; online refreshment; output tuple delay; partition-based join algorithm; persistent data; persistent disk relation; processing overhead; source updates; spatio-temporal locality; time varying stream; warehouse transformations; Costs; Data mining; Data warehouses; Delay effects; Partitioning algorithms; Pipelines; Table lookup; Warehousing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
  • Conference_Location
    Rome
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-3751-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2009.5161064
  • Filename
    5161064