DocumentCode :
479545
Title :
High dimensional sparse data Clustering Algorithm Based on Concept Feature Vector (CABOCFV)
Author :
Wu, Sen ; Gu, Shujuan ; Gao, Xuedong
Author_Institution :
Sch. of Econ. & Manage., Univ. of Sci. & Technol., Beijing
Volume :
1
fYear :
2008
fDate :
12-15 Oct. 2008
Firstpage :
202
Lastpage :
206
Abstract :
Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed. This paper focuses on using concept lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data clustering algorithm based on concept feature vector (CABOCFV), which reduces the redundancy of concept construction using concept sparse feature distance and concept feature vector, and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.
Keywords :
data analysis; data mining; pattern clustering; vectors; concept extent; concept feature vector; concept intent; concept lattice; concept sparse feature distance; data analysis; data mining; data object cluster; high dimensional sparse data clustering algorithm; knowledge processing; Clustering algorithms; Computational complexity; Data analysis; Discrete wavelet transforms; Lattices; Noise reduction; Space technology; Spatial databases; Technology management; Vectors; Clustering Analysis; Concept Lattice Construction; High Dimensional Data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Service Operations and Logistics, and Informatics, 2008. IEEE/SOLI 2008. IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2012-4
Electronic_ISBN :
978-1-4244-2013-1
Type :
conf
DOI :
10.1109/SOLI.2008.4686391
Filename :
4686391
Link To Document :
بازگشت