DocumentCode :
2533284
Title :
A Hybrid Method for XML Clustering
Author :
Piao, Yong ; Liu, Chen ; Wang, Xiu-Kun
Author_Institution :
Sch. of Software, Dalian Univ. of Technol., Dalian, China
fYear :
2010
fDate :
18-20 Dec. 2010
Firstpage :
286
Lastpage :
290
Abstract :
An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct search for cluster centers. Experiments show that the NCC can obtain high purity and F-measure value and is suitable and applicable for clustering XML with both homogenous and heterogeneous structures.
Keywords :
XML; computational complexity; data mining; pattern clustering; text analysis; F-measure value; TF-IDF principle; XML clustering; XML file; computation complexity; content information; extensible markup language; longest common subsequence; neighbor center clustering; structural information; text mining; Accuracy; Clustering algorithms; Data mining; Equations; Feature extraction; Mathematical model; XML; Longest Common Subsequence; neighbor center cluster; structural similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-9482-8
Type :
conf
DOI :
10.1109/PAAP.2010.55
Filename :
5715096
Link To Document :
بازگشت