• DocumentCode
    3324434
  • Title

    Designing Random Sample Synopses with Outliers

  • Author

    Rösch, Philipp ; Gemulla, Rainer ; Lehner, Wolfgang

  • Author_Institution
    Database Technol. Group, Tech. Univ. Dresden, Dresden
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1400
  • Lastpage
    1402
  • Abstract
    Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of "outliers" in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synopses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.
  • Keywords
    database indexing; random processes; sampling methods; very large databases; large dataset synopses design; multiple aggregation column; outlier indexing scheme; outlier-aware sample synopses; random sampling; Aggregates; Computer science; Data analysis; Estimation error; Image databases; Indexing; Large-scale systems; Query processing; Sampling methods; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497569
  • Filename
    4497569