Title :
Scalable data mining with log based consistency DSM for high performance distributed computing
Author :
Hirayama, Hideaki ; Honda, Hiroki ; Yuba, Toshitsugu
Author_Institution :
Graduate Sch. of Inf. Syst., Univ. of Electro-Commun., Tokyo, Japan
Abstract :
Mining the large Web based online distributed databases to discover new knowledge and financial gain is an important research problem. These computations require high performance distributed and parallel computing environments. Traditional data mining techniques such as classification, association, clustering can be extended to find new efficient solutions. The paper presents the scalable data mining problem, proposes the use of software DSM (distributed shared memory) with a new mechanism as an effective solution and discusses both the implementation and performance evaluation results. It is observed that the overhead of a software DSM is very large for scalable data mining programs. A new Log Based Consistency (LBC) mechanism, especially designed for scalable data mining on the software DSM is proposed to overcome this overhead. Traditional association rule based data mining programs frequently modify the same fields by count-up operations. In contrast, the LBC mechanism keeps up the consistency by broadcasting the count-up operation logs among the multiple nodes
Keywords :
data integrity; data mining; distributed databases; distributed shared memory systems; information resources; very large databases; LBC mechanism; Web based online distributed database mining; association rule based data mining programs; count-up operation logs; count-up operations; data mining techniques; distributed shared memory; high performance distributed computing; log based consistency DSM; multiple nodes; parallel computing environments; performance evaluation results; research problem; scalable data mining; scalable data mining problem; scalable data mining programs; software DSM; Association rules; Bayesian methods; Broadcasting; Clustering algorithms; Data mining; Databases; Distributed computing; High performance computing; Information systems; Software performance;
Conference_Titel :
Engineering of Complex Computer Systems, 2000. ICECCS 2000. Proceedings. Sixth IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7695-0583-X
DOI :
10.1109/ICECCS.2000.873938