Abstract :
We consider the problem of estimating cascaded aggregates over a matrix presented as a sequence of updates in a data stream. A cascaded aggregate P · Q is defined by evaluating aggregate Q repeatedly over each row of the matrix, and then evaluating aggregate P over the resulting vector of values. This problem was introduced by Cormode and Muthukrishnan, PODS, 2005 [CM]. We analyze the space complexity of estimating cascaded norms on an n à d matrix to within a small relative error. Let Lp denote the p-th norm, where p is a non-negative integer. We abbreviate the cascaded norm Lk · Lp by Lk,p. (1) For any constant k ¿ p ¿ 2, we obtain a 1-pass O¿(n1-2/kd1-2/p)-space algorithm for estimating Lk,p. This is optimal up to polylogarithmic factors and resolves an open question of [CM] regarding the space complexity of L4,2. We also obtain 1-pass space-optimal algorithms for estimating L¿,k and Lk,¿. (2) We prove a space lower bound of ¿(n1-1/k) on estimating Lk,0 and Lk,1, resolving an open question due to Indyk, IITK Data Streams Workshop (Problem 8), 2006. We also resolve two more questions of [CM] concerning Lk,2 estimation and block heavy hitter problems. Ganguly, Bansal and Dube (FAW, 2008) claimed an O(1)-space algorithm for estimating Lk,p for any k,p ¿ [0,2]. Our lower bounds show this claim is incorrect.
Keywords :
computational complexity; 1-pass space-optimal algorithm; cascaded aggregate; cascaded norms; data stream space complexity; nonnegative integer; polylogarithmic factor; Aggregates; Area measurement; Computer science; Data mining; Explosions; Frequency; Information technology; Internet; Medical services; Statistics;