DocumentCode :
1873008
Title :
Structure-oriented clustering of XML documents: A transactional approach
Author :
Costa, Gianni ; Ortale, Riccardo
Author_Institution :
ICAR, Rende, Italy
fYear :
2012
fDate :
6-8 Sept. 2012
Firstpage :
188
Lastpage :
193
Abstract :
Clustering XML documents by structure has been, generally, accomplished by looking at the occurrence of one pre-established type of structural component in the structures of the XML documents. It is likely that focusing only on one type of structural component may produce clusters with a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the XML documents or for an inappropriate choice of structural component. To overcome these limitations, a new parameter-free approach to clustering XML document is proposed, that allows to consider simultaneously multiple types of structural components to isolate structurally-homogeneous clusters of XML documents. The idea behind the approach is to represent each XML document as a transaction of boolean feature, enlightening of suitable selection of its structural components. A parameter-free clustering scheme is, then, used to isolate structural homogeneous clusters. A comparative evaluation over both real and synthetic XML data provides evidence of effectiveness and efficacy of the devised approach.
Keywords :
Boolean algebra; XML; data mining; data structures; document handling; pattern clustering; XML document clustering; XML document representation; boolean feature transactional approach; inner structural inhomogeneity; parameter-free approach; parameter-free clustering scheme; real XML data; structural component selection; structurally-homogeneous clusters; structure-oriented clustering; synthetic XML data; Clustering algorithms; Electronic mail; Focusing; Nonhomogeneous media; Partitioning algorithms; Vegetation; XML; Data Mining; XML clustering; XML transactional representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (IS), 2012 6th IEEE International Conference
Conference_Location :
Sofia
Print_ISBN :
978-1-4673-2276-8
Type :
conf
DOI :
10.1109/IS.2012.6335134
Filename :
6335134
Link To Document :
بازگشت