DocumentCode :
2515133
Title :
Clustering XML Search Results Based on Content and Structure Similarity
Author :
Min-Juan, Zhong ; Chang-Xuan, Wan ; De-Xi, Liu ; Xian-Pei, Jiao
Author_Institution :
Sch. of Inf. & Technol., Jiangxi Univ. of Finance & Econ., Nanchang, China
fYear :
2011
fDate :
5-6 Nov. 2011
Firstpage :
10
Lastpage :
14
Abstract :
Clustering XML search results is an effective way to improve performance. However, the key problem is how to measure similarity between XML documents. In this paper, we propose a semantic similarity measure method combining content with structure, in which a variety of XML document features, including term element frequency, term inverse element frequency, semantic weight of tag label and level information of the term, are analyzed and applied for computing the similarity between XML documents. In addition, two new performance evaluation methodology, namely ClusterRatio_Relevant and DocuRatio_Relevant, for clustering quality are introduced motivated by the observations of relevant documents distribution and the fact that collection has no classification information. Experiment results show that proposed similarity method(CAS measure)outperforms traditional document clustering(CO measure) in ClusterRatio_Relevant and DocuRatio_Relevant and produces better clustering quality.
Keywords :
XML; document handling; pattern classification; pattern clustering; ClusterRatio_Relevant; DocuRatio_Relevant; XML documents; XML search results clustering; classification information; content similarity; documents distribution; structure similarity; Educational institutions; Frequency measurement; Multimedia systems; Performance evaluation; Semantics; Weight measurement; XML; XML Clustering; node level; relevant cluster ratio; relevant document distribution ratio; tag weight;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Management of e-Commerce and e-Government (ICMeCG), 2011 Fifth International Conference on
Conference_Location :
Hubei
Print_ISBN :
978-1-4577-1659-1
Type :
conf
DOI :
10.1109/ICMeCG.2011.28
Filename :
6092622
Link To Document :
بازگشت