Title :
Distributed Clustering for Data Sources with Diverse Schema
Author :
Visalakshi, N. Karthikeyani ; Thangavel, K. ; Alagambigai, P.
Author_Institution :
Dept. of Comput. Sci., Vellalar Coll. For Women
Abstract :
Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some diseases, one may wish to integrate data gathered from many different hospitals. Analyzing and mining these distributed heterogeneous data sources require distributed machine learning and data mining technique In this paper, a Modified Distributed Combining Algorithm is proposed to cluster disparate data sources having diverse, possibly overlapping set of features and also need not share objects. First, all objects located at local sites are grouped using K-Means/Fuzzy C-Means clustering algorithm and resulting centroid is considered as local models. Then, the set of centroids are transformed into unified structure and optimum values are assigned to missing attributes. Finally, global cluster centroid is computed to identify global cluster model based on cluster ensemble and centroid mapping. The experiments are carried out for various datasets of UCI machine learning data repository in order to achieve the efficiency of the proposed algorithm.
Keywords :
data mining; distributed processing; fuzzy set theory; learning (artificial intelligence); pattern clustering; automated diagnostic tool; centroid mapping; cluster ensemble; data mining technique; distributed data sources clustering; distributed heterogeneous data sources; distributed machine learning; diverse schema; fuzzy c-means clustering algorithm; k-means clustering algorithm; learning task; modified distributed combining algorithm; Clustering algorithms; Computer science; Couplings; Data mining; Distributed decision making; Machine learning; Machine learning algorithms; Partitioning algorithms; Robust stability; Unsupervised learning; Distributed Clustering; Diverse Schema; Global Centroid; K-Means; Local Centroid;
Conference_Titel :
Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference on
Conference_Location :
Busan
Print_ISBN :
978-0-7695-3407-7
DOI :
10.1109/ICCIT.2008.282