DocumentCode :
3024071
Title :
Clustering Algorithm Based on Semantic Distance for XML Documents
Author :
Yang, Lingxian ; Gu, Jinguang ; Chen, Heping
Author_Institution :
Coll. of Inf. Sci. & Eng., Wuhan Univ. of Sci. & Technol., Wuhan, China
fYear :
2009
fDate :
25-26 April 2009
Firstpage :
549
Lastpage :
552
Abstract :
As the information grows exponentially, it has become a new and basic requirement to reduce the querying area efficiently and accurately for information querying. This paper proposes a semantic distance based clustering algorithm for XML documents. It discusses the algorithm in two steps. Firstly, it forms some DTD clusters with all heterogeneous DTD documents by using the global semantic dictionary. Secondly, it computes the semantic distance between XML documents which corresponded certain DTD cluster, then build some finally XML clusters according threshold value given beforehand. Users can locate document cluster and query within this area without extending all over XML documents, and the querying results satisfying the users´ requirements can be returned rapidly. The experiments show that this algorithm has good categorization function, and can facilitate information querying.
Keywords :
XML; document handling; pattern clustering; DTD cluster; XML document; clustering algorithm; document type definition; global semantic dictionary; semantic distance; Application software; Clustering algorithms; Computer science; Data engineering; Databases; Dictionaries; Educational institutions; Heuristic algorithms; Information science; XML; Documents clustering; Heterogeneous; Semantic distance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Technology and Applications, 2009 First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3604-0
Type :
conf
DOI :
10.1109/DBTA.2009.134
Filename :
5207698
Link To Document :
بازگشت