Title :
XML document clustering based on common tag names anywhere in the structure
Author :
Alishahi, Mohamad ; Ravakhah, Mehdi ; Shakeriaski, Baharak ; Naghibzade, Mahmud
Author_Institution :
Islamic Azad Univ. Mashhad Branch, Mashhad, Iran
Abstract :
One of the most effective ways to extract knowledge from large information resources is applying data mining methods. Since the amount of information on the Internet is exploding, using XML documents is common as they have many advantages. Knowledge extraction from XML documents is a way to provide more utilizable results. XCLS is one of the most efficient algorithms for XML documents clustering. In this paper we represent a new algorithm for clustering XML documents. This algorithm is an improvement over XCLS algorithm which tries to obviate its problems. We implemented both algorithms and evaluated their clustering quality and running time on the same data sets. In both cases, it is shown that the performance of the new algorithm is better.
Keywords :
XML; data mining; document handling; pattern clustering; XCLS algorithm; XML document clustering; clustering quality; data mining methods; information resources; knowledge extraction; Association rules; Clustering algorithms; Data mining; Information resources; Internet; Neural networks; Search engines; Tree data structures; Web sites; XML; XML documents; clustering; data mining; level similarity; level structure;
Conference_Titel :
Computer Conference, 2009. CSICC 2009. 14th International CSI
Conference_Location :
Tehran
Print_ISBN :
978-1-4244-4261-4
Electronic_ISBN :
978-1-4244-4262-1
DOI :
10.1109/CSICC.2009.5349643