• DocumentCode
    888563
  • Title

    Approximate processing of massive continuous quantile queries over high-speed data streams

  • Author

    Lin, Xuemin ; Xu, Jian ; Zhang, Qing ; Lu, Hongjun ; Yu, Jeffrey Xu ; Zhou, Xiaofang ; Yuan, Yidong

  • Author_Institution
    NICTA, Australia
  • Volume
    18
  • Issue
    5
  • fYear
    2006
  • fDate
    5/1/2006 12:00:00 AM
  • Firstpage
    683
  • Lastpage
    698
  • Abstract
    Quantile computation has many applications including data mining and financial data analysis. It has been shown that an ε-approximate summary can be maintained so that, given a quantile query (φ,ε), the data item at rank φN may be approximately obtained within the rank error precision εN over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different φ and ε poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
  • Keywords
    computational complexity; data mining; query processing; data mining; financial data analysis; high-speed data stream; massive continuous quantile query processing; online computation; quantile computation; sliding window; Application software; Clustering algorithms; Computer Society; Computer applications; Computer science; Costs; Data analysis; Data mining; Query processing; Stock markets; Query processing; data mining.; online computation;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2006.73
  • Filename
    1613870