• DocumentCode
    1831096
  • Title

    VFDS: Very fast database sampling system

  • Author

    Buda, Teodora Sandra ; Cerqueus, Thomas ; Kristiansen, M. ; Murphy, John

  • Author_Institution
    Performance Eng. Lab., Univ. Coll. Dublin, Dublin, Ireland
  • fYear
    2013
  • fDate
    14-16 Aug. 2013
  • Firstpage
    153
  • Lastpage
    160
  • Abstract
    In a wide range of application areas (e.g. data mining, approximate query evaluation, histogram construction), database sampling has proved to be a powerful technique. It is generally used when the computational cost of processing large amounts of information is extremely high, and a faster response with a lower level of accuracy for the results is preferred. Previous sampling techniques achieve this balance, however, an evaluation of the cost of the database sampling process should be considered. We argue that the performance of current relational database sampling techniques that maintain the data integrity of the sample database is low and a faster strategy needs to be devised. In this paper we propose a very fast sampling method that maintains the referential integrity of the sample database intact. The sampling method targets the production environment of a system under development, that generally consists of large amounts of data computationally costly to analyze. We evaluate our method in comparison with previous database sampling approaches and show that our method produces a sample database at least 300 times faster and with a maximum trade off of 0.5% in terms of sample size error.
  • Keywords
    data integrity; relational databases; sampling methods; VFDS; data integrity; production environment; referential integrity; relational database sampling techniques; sample size error; very fast database sampling system; Computer science; Diamonds; Educational institutions; Production; Relational databases; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1109/IRI.2013.6642466
  • Filename
    6642466