Title :
A Hybrid Method for XML Clustering
Author :
Piao, Yong ; Liu, Chen ; Wang, Xiu-Kun
Author_Institution :
Sch. of Software, Dalian Univ. of Technol., Dalian, China
Abstract :
An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct search for cluster centers. Experiments show that the NCC can obtain high purity and F-measure value and is suitable and applicable for clustering XML with both homogenous and heterogeneous structures.
Keywords :
XML; computational complexity; data mining; pattern clustering; text analysis; F-measure value; TF-IDF principle; XML clustering; XML file; computation complexity; content information; extensible markup language; longest common subsequence; neighbor center clustering; structural information; text mining; Accuracy; Clustering algorithms; Data mining; Equations; Feature extraction; Mathematical model; XML; Longest Common Subsequence; neighbor center cluster; structural similarity;
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-9482-8
DOI :
10.1109/PAAP.2010.55