Title :
Supporting self-adaptation in streaming data mining applications
Author :
Chen, Liang ; Agrawal, Gagan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
There are many application classes where the users are flexible with respect to the output quality. At the same time, there are other constraints, such as the need for real-time or interactive response, which are more crucial. This paper presents and evaluates a runtime algorithm for supporting adaptive execution for such applications. The particular domain we target is distributed data mining on streaming data. This work has been done in the context of a middleware system called GATES (grid-based adaptive execution on streams) that we have been developing. The self-adaptation algorithm we present and evaluate in this paper has the following characteristics. First, it carefully evaluates the long-term load at each processing stage. It considers different possibilities for the load at a processing stage and its next stages, and decides if the value of an adaptation parameter needs to be modified, and if so, in which direction. To find the ideal new value of an adaptation parameter, it performs a binary search on the specified range of the parameter. To evaluate the self-adaptation algorithm in our middleware, we have implemented two streaming data mining applications. The main observations from our experiments are as follows. First, our algorithm is able to quickly converge to stable values of the adaptation parameter, for different data arrival rates, and independent of the specified initial value. Second, in a dynamic environment, the algorithm is able to adapt the processing rapidly. Finally, in both static and dynamic environments, the algorithm clearly outperforms the algorithm described in our earlier work and an obvious alternative, which is based on linear-updates.
Keywords :
data mining; grid computing; middleware; self-adjusting systems; distributed data mining; grid-based adaptive execution; middleware system; self-adaptation; streaming data mining; Application software; Computer science; Data mining; Data visualization; Heuristic algorithms; Middleware; Program processors; Real time systems; Runtime; Streaming media;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Print_ISBN :
1-4244-0054-6
DOI :
10.1109/IPDPS.2006.1639312