Title :
Efficient algorithm for projected clustering
Author :
Ng Ka Ka, Eric ; Fu, Ada Wai-Chee
Author_Institution :
Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
Abstract :
With high-dimensional data, natural clusters are expected to exist in different subspaces. We propose the EPC (efficient projected clustering) algorithm to discover the sets of correlated dimensions and the location of the clusters. This algorithm is quite different from previous approaches and has the following advantages: (1) there is no requirement on the input regarding the number of natural clusters and the average cardinality of the subspaces; (2) it can handle clusters of irregular shapes; (3) it produces better clustering results compared to the best previous method; (4) it has high scalability. From experiments, it is several times faster than the previous method, while producing more accurate results
Keywords :
correlation methods; data mining; pattern clustering; EPC algorithm; average subspace cardinality; cluster location discovery; correlated dimensions discovery; efficient projected clustering algorithm; high-dimensional data; irregular cluster shapes; natural data clusters; scalability; Clustering algorithms; Data analysis; Data engineering; Histograms; Linear approximation; Partitioning algorithms; Scalability; Statistical analysis; Testing;
Conference_Titel :
Data Engineering, 2002. Proceedings. 18th International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1531-2
DOI :
10.1109/ICDE.2002.994727