Title :
Multisets and Clustering XML Documents
Author :
Iyer, Swami ; Simovici, Dan A.
Author_Institution :
Univ. of Massachusetts at Boston, Boston
Abstract :
We propose a novel and efficient solution to the problem of clustering XML documents based on their structure. We use operations on multisets of paths of document trees to define certain metrics on multisets. These metrics are used for clustering real and synthesized XML documents to produce high-quality clusterings.
Keywords :
XML; document handling; tree data structures; tree searching; XML document clustering; document tree path; eXtensible Markup Language; high-quality clustering; multisets metrics; Artificial intelligence; Clustering algorithms; Clustering methods; Computer science; Costs; Data mining; Engines; Fourier transforms; Markup languages; XML;
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3015-4
DOI :
10.1109/ICTAI.2007.18