XML Document Co-clustering via Non-negative Matrix Tri-factorization

Author

Costa, Gianni ; Ortale, Riccardo

Author_Institution

ICAR, Rende, Italy

fYear

2014

fDate

10-12 Nov. 2014

Firstpage

607

Lastpage

614

Abstract

XML co-clustering is a promising method to overcome the effectiveness of traditional XML clustering approaches, due to the exploitation of the mutual relationships between XML documents and their respective XML features while clustering both simultaneously. To shed light on this so far unexplored research direction, we conduct a systematic study of the effectiveness of XML co-clustering, by viewing the task as parametric with respect to the XML features. Thus, the definition and exploitation of three distinct types of XML features, which are respectively informative of the content, structure and both aspects of the XML documents, allows an in-depth investigation of all three different instances of the XML co-clustering task, i.e., XML co-clustering by content alone, structure alone as well as both structure and content. XML co-clustering relies on a non-negative matrix trifactorization technique, that efficiently processes large-scale input data, which is especially useful with large corpora of text-centric XML documents. The relevance of the structural and content features of the XML documents is assessed through a new weighting scheme. An intensive experimental evaluation on real-world benchmark XML corpora reveals a higher effectiveness of XML co-clustering in comparison with state-of-the-art approaches to XML clustering. Insights are also provided on the effectiveness of XML feature clustering.

Keywords

XML; document handling; matrix decomposition; pattern clustering; XML clustering approach; XML co-clustering; XML coclustering; XML corpora; XML document coclustering; XML feature clustering; content feature; nonnegative matrix trifactorization; structural feature; text-centric XML document; Context; Electronic publishing; Encyclopedias; Matrix decomposition; Vegetation; XML; Semistructured Data Mining; XML Analysis; XML Co-Clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on

Conference_Location

Limassol

ISSN

1082-3409

Type

conf

DOI

10.1109/ICTAI.2014.96

Filename

6984532