Title :
Semantic Structural Similarity for Clustering XML Documents
Author :
Kim, Tae-Soon ; Lee, Ju-Hong ; Song, Jae-Won
Author_Institution :
Sch. of Comput. Sci. & Eng., Inha Univ., Incheon
Abstract :
The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. Previous works on similarity measure for XML document clustering have no consideration for the semantic information as they consider only the structural information. In this paper, we propose the novel similarity measure that concurrently considers both structural and semantic information of XML document. Our experiments show that the proposed method improve accuracy on the clustering from the semantic point of view, compared to the previous works.
Keywords :
XML; document handling; pattern clustering; XML document clustering; data representation; semantic structural similarity; Clustering algorithms; Clustering methods; Computer science; Data mining; HTML; Information analysis; Information retrieval; Information technology; Partitioning algorithms; XML; Semantic Structural Similarity; XML document Clustering; XML document similarity;
Conference_Titel :
Convergence and Hybrid Information Technology, 2008. ICHIT '08. International Conference on
Conference_Location :
Daejeon
Print_ISBN :
978-0-7695-3328-5
DOI :
10.1109/ICHIT.2008.183