• DocumentCode
    2192157
  • Title

    DWS-AQA: a cost effective approach for very large data warehouses

  • Author

    Bernardino, Jorge ; Furtado, Pedro ; Madeira, Henrique

  • Author_Institution
    ISEC - DEIS, Inst. Polytech.of Coimbra, Portugal
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    233
  • Lastpage
    242
  • Abstract
    Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the users. Large data warehouses require a very expensive setup, typically based on high-end servers or high-performance clusters. In this paper we propose and evaluate a simple but very effective method to implement a data warehouse using the computers and workstations typically available in large organizations. The proposed approach is called data warehouse striping with approximate query answering (DWS-AQA). The goal is to use the processing and disk capacity normally available in large workstation networks to implement a data warehouse with a very reduced infrastructure cost. As the data warehouse shares computers that are also being used for other purposes, most of the times only a fraction of the computers will be able to execute the partial queries in time. However, as we show in the paper, the approximated answers estimated from partial results have a very small error for most of the plausible scenarios. Moreover, as the data warehouse facts are partitioned in a strict uniform way, it is possible to calculate tight confidence intervals for the approximated answers, providing the user with a measure of the accuracy of the query results. A set of experiments on the TPC-H benchmark database is presented to show the accuracy of DWS-AQA for a large number of scenarios.
  • Keywords
    data warehouses; query processing; workstation clusters; DWS-AQA; data warehouse striping with approximate query answering; disk capacity; infrastructure cost; interactive response time; large workstation networks; partial queries; processing capacity; scalable architecture; tight confidence intervals; very large data warehouses; Computer networks; Concurrent computing; Costs; Data warehouses; Databases; Delay; Technology management; Time sharing computer systems; Warehousing; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Applications Symposium, 2002. Proceedings. International
  • ISSN
    1098-8068
  • Print_ISBN
    0-7695-1638-6
  • Type

    conf

  • DOI
    10.1109/IDEAS.2002.1029676
  • Filename
    1029676