Title :
An Improved XML Document Clustering Using Path Feature
Author :
Yuan, Jin-sha ; Li, Xin-ye ; Ma, Li-na
Author_Institution :
Dept. of Electron. & Commun. Eng., North China Electr. Power Univ., Baoding
Abstract :
Extensible markup language (XML) documents clustering is useful to XML application such as XML search engine. The element tags and their position in the document´s hierarchy provide valuable information to clustering XML documents. XML path can represent both element tags and their position information. Since common Xpath represents only parts of the XML structural, using common Xpath as XML structural representation is not always efficient to XML clustering, especially when those documents are with dissimilar structure. In this paper, we use all paths less than or equal to length L as feature vectors for XML documents. Since the feature vector matrix is usually sparse, we use bipartite graph to express association relation among XML documents and path features. Based on this idea, we improved the path-based XML clustering algorithm. Experiments are described to demonstrate its efficiency.
Keywords :
XML; document handling; pattern clustering; XML clustering; XML document clustering; XML documents clustering; XML search engine; XML structural representation; association relation; bipartite graph; common Xpath; extensible markup language; path feature; Bipartite graph; Clustering methods; Educational institutions; Fuzzy systems; Knowledge engineering; Power engineering and energy; Search engines; Sparse matrices; Tellurium; XML; XML clustering; bipartite graph; path;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.66