DocumentCode :
397064
Title :
A general framework for clustering high-dimensional datasets
Author :
Yanchang, Zhao ; Junde, Song
Author_Institution :
Beijing Univ. of Posts & Telecommun., China
Volume :
2
fYear :
2003
fDate :
4-7 May 2003
Firstpage :
1091
Abstract :
In many fields, the datasets used in data mining applications are usually of high dimensionality. Most existing algorithms of clustering are effective and efficient when the dimensionality is low, but their performance and effectiveness degrade when the data space is high-dimensional. One reason is that their complexity increases exponentially with the dimensionality. To solve the problem, we put forward a general framework for clustering high-dimensional datasets. Common clustering algorithms, when combined with our framework, can be applied to cluster high-dimensional datasets efficiently. In our framework, a high-dimensional clustering is broken into several one- or two-dimensional clustering phases. During each phase, only one or two dimensions are involved. In such a way, common algorithms for clustering low-dimensional datasets can be used to process high-dimensional ones. In addition, attributes of different types can be processed with different algorithms in separate phases and datasets of hybrid data types can be handled easily. The efficiency and effectiveness of our framework is proven in our experiments.
Keywords :
data mining; pattern clustering; clustering high-dimensional datasets; data mining applications; hybrid data types; two-dimensional clustering phases; Algorithm design and analysis; Clustering algorithms; Data mining; Degradation; Performance analysis; Telecommunications;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
ISSN :
0840-7789
Print_ISBN :
0-7803-7781-8
Type :
conf
DOI :
10.1109/CCECE.2003.1226086
Filename :
1226086
Link To Document :
بازگشت