• DocumentCode
    813615
  • Title

    Domain-driven data synopses for dynamic quantiles

  • Author

    Gilbert, Anna C. ; Kotidis, Yannis ; Muthukrishnan, S. ; Strauss, Martin J.

  • Author_Institution
    Dept. of Math., Michigan Univ., Ann Arbor, MI, USA
  • Volume
    17
  • Issue
    7
  • fYear
    2005
  • fDate
    7/1/2005 12:00:00 AM
  • Firstpage
    927
  • Lastpage
    938
  • Abstract
    In this paper, we present new algorithms for dynamically computing quantiles of a relation subject to insert as well as delete operations. At the core of our algorithms lies a small-space multiresolution representation of the underlying data distribution based on random subset sums or RSSs. These RSSs are updated with every insert and delete operation. When quantiles are demanded, we use these RSSs to estimate quickly, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. While quantiles have found many uses in databases, in this paper, our focus is primarily on network management applications that monitor the distribution of active sessions in the network. Our examples are drawn both from the telephony and the IP network, where the goal is to monitor the distribution of the length of active calls and IP flows, respectively, over time. For such applications, we propose a new type of histogram that uses RSSs for summarizing the dynamic parts of the distributions while other parts with small volume of sessions are approximated using simple counters.
  • Keywords
    data analysis; data mining; data structures; relational databases; statistical analysis; IP network; active session distribution monitoring; data distribution; data streams; database statistics; delete operation; domain-driven data synopses; dynamic quantiles; hybrid histogram data structure; insert operation; network management applications; random subset sums; relational database; small-space multiresolution data representation; telephony; Counting circuits; Database systems; Heuristic algorithms; Histograms; IP networks; Monitoring; Query processing; Statistical distributions; Telephony; Transaction databases; Index Terms- Quantiles; data streams.; database statistics;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.108
  • Filename
    1432702