Title :
A Robust Clustering Method for XML Documents
Author :
Zhao, Bin ; Zhang, Yong-Sheng ; Zhang, Hua-Xiang
Author_Institution :
Coll. of Inf. Sci. & Eng., Shandong Normal Univ., Jinan, China
Abstract :
With the increase of XML data over the Internet, managing and analyzing huge amount of XML documents has played an important role for information management. This paper addresses the problem of clustering XML documents. Borrowing the idea of semi-clustering, it proposes a robust clustering method through a combination of single partitional and hierarchical clustering algorithms, which can eliminate the defects of single clustering algorithms. Experiments on real XML documents collection show that our method can group large collection of XML documents into appropriate clusters efficiently without fixed number of clusters. Moreover, our method is less sensitive to noises than single clustering algorithm.
Keywords :
Internet; XML; information management; Internet; XML data; XML documents; information management; robust clustering method; single clustering algorithm; Clustering algorithms; Clustering methods; Data mining; Industrial engineering; Information management; Innovation management; Internet; Partitioning algorithms; Robustness; XML;
Conference_Titel :
Information Management, Innovation Management and Industrial Engineering, 2008. ICIII '08. International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-0-7695-3435-0
DOI :
10.1109/ICIII.2008.181