DocumentCode :
2814161
Title :
Semantic Structural Similarity for Clustering XML Documents
Author :
Kim, Tae-Soon ; Lee, Ju-Hong ; Song, Jae-Won
Author_Institution :
Sch. of Comput. Sci. & Eng., Inha Univ., Incheon
fYear :
2008
fDate :
28-30 Aug. 2008
Firstpage :
552
Lastpage :
557
Abstract :
The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. Previous works on similarity measure for XML document clustering have no consideration for the semantic information as they consider only the structural information. In this paper, we propose the novel similarity measure that concurrently considers both structural and semantic information of XML document. Our experiments show that the proposed method improve accuracy on the clustering from the semantic point of view, compared to the previous works.
Keywords :
XML; document handling; pattern clustering; XML document clustering; data representation; semantic structural similarity; Clustering algorithms; Clustering methods; Computer science; Data mining; HTML; Information analysis; Information retrieval; Information technology; Partitioning algorithms; XML; Semantic Structural Similarity; XML document Clustering; XML document similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Convergence and Hybrid Information Technology, 2008. ICHIT '08. International Conference on
Conference_Location :
Daejeon
Print_ISBN :
978-0-7695-3328-5
Type :
conf
DOI :
10.1109/ICHIT.2008.183
Filename :
4622883
Link To Document :
بازگشت