DocumentCode :
2507203
Title :
KLNCC: A new nonlinear correlation clustering algorithm based on KL-divergence
Author :
Sha, Chaofeng ; Qiu, Xipeng ; Zhou, Aoying
Author_Institution :
Dept. of Comput. Sci. & Eng., Fudan Univ., Shanghai
fYear :
2008
fDate :
8-11 July 2008
Firstpage :
125
Lastpage :
130
Abstract :
The problem of finding correlation among subsets of features in high-dimensional data arises in many applications. There has been much work on finding those correlations, including linear and nonlinear correlation clusters. In this paper, we present KLNCC, a novel nonlinear correlation clustering algorithm which adopts a dynamic two-phase approach. In the first phase, we find micro clusters by EM algorithm. In the second phase, these microclusters are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by the users. When merging microclusters, we adopt the KL-divergence as the distance between two microclusters, which has explicit form when we use the EM clustering algorithm to find the microclusters. Our experimental evaluation on several real datasets demonstrates that KLNCC indeed discovers meaningful and accurate nonlinear correlation clusters.
Keywords :
data handling; expectation-maximisation algorithm; EM clustering algorithm; KL-divergence; KLNCC; dynamic two-phase approach; high-dimensional data; micro clusters; nonlinear correlation clustering algorithm; Application software; Chaos; Clustering algorithms; Computer science; Data engineering; Databases; Gaussian processes; Iterative algorithms; Merging; Principal component analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-2357-6
Electronic_ISBN :
978-1-4244-2358-3
Type :
conf
DOI :
10.1109/CIT.2008.4594661
Filename :
4594661
Link To Document :
بازگشت