DocumentCode :
188638
Title :
XML Document Co-clustering via Non-negative Matrix Tri-factorization
Author :
Costa, Gianni ; Ortale, Riccardo
Author_Institution :
ICAR, Rende, Italy
fYear :
2014
fDate :
10-12 Nov. 2014
Firstpage :
607
Lastpage :
614
Abstract :
XML co-clustering is a promising method to overcome the effectiveness of traditional XML clustering approaches, due to the exploitation of the mutual relationships between XML documents and their respective XML features while clustering both simultaneously. To shed light on this so far unexplored research direction, we conduct a systematic study of the effectiveness of XML co-clustering, by viewing the task as parametric with respect to the XML features. Thus, the definition and exploitation of three distinct types of XML features, which are respectively informative of the content, structure and both aspects of the XML documents, allows an in-depth investigation of all three different instances of the XML co-clustering task, i.e., XML co-clustering by content alone, structure alone as well as both structure and content. XML co-clustering relies on a non-negative matrix trifactorization technique, that efficiently processes large-scale input data, which is especially useful with large corpora of text-centric XML documents. The relevance of the structural and content features of the XML documents is assessed through a new weighting scheme. An intensive experimental evaluation on real-world benchmark XML corpora reveals a higher effectiveness of XML co-clustering in comparison with state-of-the-art approaches to XML clustering. Insights are also provided on the effectiveness of XML feature clustering.
Keywords :
XML; document handling; matrix decomposition; pattern clustering; XML clustering approach; XML co-clustering; XML coclustering; XML corpora; XML document coclustering; XML feature clustering; content feature; nonnegative matrix trifactorization; structural feature; text-centric XML document; Context; Electronic publishing; Encyclopedias; Matrix decomposition; Vegetation; XML; Semistructured Data Mining; XML Analysis; XML Co-Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on
Conference_Location :
Limassol
ISSN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2014.96
Filename :
6984532
Link To Document :
بازگشت