DocumentCode
2533284
Title
A Hybrid Method for XML Clustering
Author
Piao, Yong ; Liu, Chen ; Wang, Xiu-Kun
Author_Institution
Sch. of Software, Dalian Univ. of Technol., Dalian, China
fYear
2010
fDate
18-20 Dec. 2010
Firstpage
286
Lastpage
290
Abstract
An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct search for cluster centers. Experiments show that the NCC can obtain high purity and F-measure value and is suitable and applicable for clustering XML with both homogenous and heterogeneous structures.
Keywords
XML; computational complexity; data mining; pattern clustering; text analysis; F-measure value; TF-IDF principle; XML clustering; XML file; computation complexity; content information; extensible markup language; longest common subsequence; neighbor center clustering; structural information; text mining; Accuracy; Clustering algorithms; Data mining; Equations; Feature extraction; Mathematical model; XML; Longest Common Subsequence; neighbor center cluster; structural similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
Conference_Location
Dalian
Print_ISBN
978-1-4244-9482-8
Type
conf
DOI
10.1109/PAAP.2010.55
Filename
5715096
Link To Document