DocumentCode
841514
Title
A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems
Author
Wolff, Ran ; Bhaduri, Kanishka ; Kargupta, Hillol
Author_Institution
Dept. of Manage. Inf. Syst., Haifa Univ., Haifa
Volume
21
Issue
4
fYear
2009
fDate
4/1/2009 12:00:00 AM
Firstpage
465
Lastpage
478
Abstract
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system´s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.
Keywords
data mining; distributed processing; pattern clustering; data stream mining; decision trees; generic local algorithm; global data mining models; information retrieval; k-means clustering; large distributed systems; load sharing; message routing; Data mining; Distributed databases; Distributed systems; Information Storage and Retrieval; Information Technology; Mining methods and algorithms; Peer to Peer Data Mining; Systems and Software;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2008.169
Filename
4604665
Link To Document