• DocumentCode
    2769686
  • Title

    Semi-supervised hierarchical clustering for personalized web image organization

  • Author

    Meng, Lei ; Tan, Ah-Hwee

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Existing efforts on web image organization usually transform the task into surrounding text clustering. However, Current text clustering algorithms do not address the problem of insufficient statistical information for image representation and noisy tags which greatly decreases the clustering performance while increases the computational cost. In this paper, we propose a two-step semi-supervised hierarchical clustering algorithm, Personalized Hierarchical Theme-based Clustering (PHTC), for web image organization. In the first step, the Probabilistic Fusion ART (PF-ART) is proposed for grouping semantically similar images and simultaneously learning the probabilistic distribution of tag occurrence for mining the key tags/topics of clusters. In this way, the side-effect of noisy tags can be largely eliminated. Moreover, PF-ART can incorporate user preference for semi-supervised learning and provide users a direct control of clustering results. In the second step, a novel agglomerative merging strategy based on Cluster Semantic Relevance, proposed for measuring the semantic similarity between clusters, is employed for associating the clusters by generating a semantic hierarchy. Different from existing hierarchical clustering algorithms, the proposed merging strategy can provide a multi-branch tree structure which is more systematic and clearer than traditional binary tree structure. Extensive experiments on two real world web image data sets, namely NUS-WIDE and Flickr, demonstrate the effectiveness of our algorithm for large web image data sets.
  • Keywords
    ART neural nets; Internet; image representation; image retrieval; learning (artificial intelligence); text analysis; trees (mathematics); PF-ART; PHTC; cluster semantic relevance; image representation; large Web image data sets; multibranch tree structure; noisy tags; novel agglomerative merging strategy; personalized Web image organization; personalized hierarchical theme-based clustering; probabilistic distribution; probabilistic fusion ART; tag occurrence; text clustering algorithms; two-step semi-supervised hierarchical clustering algorithm; Clustering algorithms; Feature extraction; Probabilistic logic; Prototypes; Semantics; Subspace constraints; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2012 International Joint Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-1488-6
  • Electronic_ISBN
    2161-4393
  • Type

    conf

  • DOI
    10.1109/IJCNN.2012.6252397
  • Filename
    6252397