Title :
Finding correlated heavy-hitters over data streams
Author :
Lahiri, Bibudh ; Tirthapura, Srikanta
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
Abstract :
We consider online mining of correlated heavy-hitters (CHH) from network data streams. Given a multidimensional dataset, a correlated aggregate query first filters a subset by applying a predicate along a primary dimension, and then computes aggregates along a secondary dimension of that subset data. We consider queries of the following form: "In a stream of (x, y) tuples, on the subset H of all x values that are heavy-hitters, maintain those y values that occur frequently with the x values in H". This query arises naturally in situations where we need to track not only the identity of frequently occurring elements in a stream, but also additional information associated with these elements along other dimensions. Prior work on tracking heavy-hitters has focused only on tracking the identity and frequency of heavy-hitters on a single dimensional stream, and yield little information about correlated heavy-hitters. Our online data stream algorithm is easy to implement and uses workspace which is orders of magnitude smaller than the stream itself. We present provable guarantees on the maximum error estimates, as well as experimental results, that demonstrate the space-accuracy trade-off on a large data stream of packet headers from a backbone network link.
Keywords :
data mining; query processing; backbone network link; correlated aggregate query; correlated heavy-hitters; maximum error estimates; network data streams; online mining; packet headers; space-accuracy trade-off; Aggregates; Data engineering; Databases; Filters; Frequency; Multidimensional systems; Network servers; Spine; Telecommunication traffic; Data streams; correlation; heavy-hitters; sketches;
Conference_Titel :
Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International
Conference_Location :
Scottsdale, AZ
Print_ISBN :
978-1-4244-5737-3
DOI :
10.1109/PCCC.2009.5403820