Title :
Clustering Relational Database Entities Using K-means
Author :
Bourennani, Farid ; Guennoun, Mouhcine ; Zhu, Ying
Author_Institution :
Inst. of Technol., Univ. of Ontario, Oshawa, ON, Canada
Abstract :
The fast evolution of hardware and the internet made large volumes of data more accessible. This data is composed of heterogeneous data types such as text, numbers, multimedia, and others. Non-overlapping research communities work on processing homogeneous data types. Nevertheless, from the user perspective, these heterogeneous data types should behave and be accessed in a similar fashion. Processing heterogeneous data types, which is Heterogeneous Data Mining (HDM), is a complex task. However, the HDM by Unified Vectorization (HDM-UV) seems to be an appropriate solution for this problem because it permits to process the heterogeneous data types simultaneously. In this paper, we use K-means and Self-Organizing Maps for simultaneously processing textual and numerical data types by UV. We evaluate how the HDM-UV improves the clustering results of these two algorithms (SOM, K-means) by comparing them to the traditional homogeneous data processing. Furthermore, we compare the clustering results of the two algorithms applied to a data integration problem.
Keywords :
data mining; pattern clustering; relational databases; self-organising feature maps; K-means clustering; data integration; heterogeneous data mining; relational database entities; self-organizing maps; unified vectorization; Biomedical measurements; Business; Clustering algorithms; Clustering methods; Companies; Data mining; Hardware; Mining industry; Relational databases; Weight measurement; Data Integration; Heterogeneous data mining; K-means; Pre-Processing; SOM;
Conference_Titel :
Advances in Databases Knowledge and Data Applications (DBKDA), 2010 Second International Conference on
Conference_Location :
Menuires
Print_ISBN :
978-1-4244-6081-6
DOI :
10.1109/DBKDA.2010.32