Title :
Research on XML Element Search Results Clustering
Author :
Zhong Min-Juan ; Wan Chang-Xuan ; Liu De-Xi ; Jiao Xian-Pei
Author_Institution :
Sch. of Inf. Technol., Jiangxi Univ. of Finance & Econ., Nanchang, China
Abstract :
Clustering XML search results is an effective way to improve performance. However, the key problem is how to measure similarity between XML documents. This paper studies XML search results clustering based on element granularity and proposes one similarity measurement method. The method firstly uses latent semantic indexing technology(LSI) to obtain term semantics and then combines the XML element node content and semantic structure properties(CASS). To evaluate clustering performance, two new performance evaluation methodologies, namely R_ClusterRatio and R_DocuRatio are introduced. It is motivated by the observations of relevant documents distribution and the fact that the experiment data collection, IEEE CS corpus, do not provide classification information. Experiment results show that proposed similarity method combining term semantics with content and structure semantics integration(LSI-CASS) is feasible, and it produces better clustering quality than LSI-CAS.
Keywords :
XML; pattern clustering; XML documents; XML element node content; XML element search results clustering; classification information; clustering performance; clustering quality; element granularity; latent semantic indexing technology; performance evaluation methodology; semantic structure properties; similarity measurement method; structure semantics integration; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Semantics; Singular value decomposition; XML; XML element clustering; content and structure semantic; term semantics;
Conference_Titel :
Management of e-Commerce and e-Government (ICMeCG), 2012 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2943-9
DOI :
10.1109/ICMeCG.2012.89