مرکز منطقه ای اطلاع رساني علوم و فناوري - Clustering XML Documents Based on Data Type

DocumentCode :

2001143

Title :

Clustering XML Documents Based on Data Type

Author :

Zhou, Chong ; Lu, Yansheng

Author_Institution :

Coll. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., China

Volume :

fYear :

2008

fDate :

13-17 Dec. 2008

Firstpage :

122

Lastpage :

127

Abstract :

The existing so-called semantic XML document clustering algorithms usually use a synonymous word library to calculate semantic similarities among XML documents. However, when people create their own XML documents, they name the element randomly and often use lots of abbreviations. Many tags are not real words at all. The XML documents created by different people may appear very different from each other even if they describe the same object. The traditional methods do not work well in such case. To address the problem, we proposed a novel similarity measure standard based on data-type tree, a model integrating data types and tags of XML documents. A clustering algorithm DT²K-means is also proposed to cluster XML documents. Empirical experiment results on real world data sets show DT²K-means can group the semantic similar XML documents together correctly, which contain different tags but describe the same object.

Keywords :

XML; pattern clustering; data type tree; eXtensible Markup Language; semantic XML document clustering algorithm; semantic similarity; similarity measure standard; synonymous word library; Clustering algorithms; Computational intelligence; Computer science; Computer security; Data security; Educational institutions; Libraries; Measurement standards; Stability; XML; XML; clustering; data type;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Security, 2008. CIS '08. International Conference on

Conference_Location :

Suzhou

Print_ISBN :

978-0-7695-3508-1

Type :

conf

DOI :

10.1109/CIS.2008.90

Filename :

4724749

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2001143