DocumentCode :
1763753
Title :
Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering
Author :
Lei Meng ; Ah-Hwee Tan ; Dong Xu
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Volume :
26
Issue :
9
fYear :
2014
fDate :
Sept. 2014
Firstpage :
2293
Lastpage :
2306
Abstract :
Co-clustering is a commonly used technique for tapping the rich meta-information of multimedia web documents, including category, annotation, and description, for associative discovery. However, most co-clustering methods proposed for heterogeneous data do not consider the representation problem of short and noisy text and their performance is limited by the empirical weighting of the multi-modal features. In this paper, we propose a generalized form of Heterogeneous Fusion Adaptive Resonance Theory, called GHF-ART, for co-clustering of large-scale web multimedia documents. By extending the two-channel Heterogeneous Fusion ART (HF-ART) to multiple channels, GHF-ART is designed to handle multimedia data with an arbitrarily rich level of meta-information. For handling short and noisy text, GHF-ART does not learn directly from the textual features. Instead, it identifies key tags by learning the probabilistic distribution of tag occurrences. More importantly, GHF-ART incorporates an adaptive method for effective fusion of multi-modal features, which weights the features of multiple data sources by incrementally measuring the importance of feature modalities through the intra-cluster scatters. Extensive experiments on two web image data sets and one text document set have shown that GHF-ART achieves significantly better clustering performance and is much faster than many existing state-of-the-art algorithms.
Keywords :
ART neural nets; Internet; data mining; document handling; learning (artificial intelligence); multimedia computing; pattern clustering; statistical distributions; GHF-ART; Web image data sets; adaptive method; associative discovery; heterogeneous data; heterogeneous fusion adaptive resonance theory; intra-cluster scatters; large-scale Web multimedia documents; multimedia Web documents; multimedia data co-clustering method; multimedia data handling; multimedia data mining; multimodal features; multiple data sources; noisy text; probabilistic distribution; rich meta-information; semisupervised heterogeneous fusion; semisupervised learning; tag occurrences; text document set; textual features; two-channel heterogeneous fusion ART; Clustering algorithms; Feature extraction; Multimedia communication; Pattern matching; Subspace constraints; Vectors; Visualization; Semi-supervised learning; heterogeneous data co-clustering; multimedia data mining;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2013.47
Filename :
6482563
Link To Document :
بازگشت