• DocumentCode
    639370
  • Title

    A Fast Approximate AIB Algorithm for Distributional Word Clustering

  • Author

    Lei Wang ; Jianjia Zhang ; Luping Zhou ; Wanqing Li

  • Author_Institution
    Sch. of Comput. Sci. & Software Eng., Univ. of Wollongong, Wollongong, NSW, Australia
  • fYear
    2013
  • fDate
    23-28 June 2013
  • Firstpage
    556
  • Lastpage
    563
  • Abstract
    Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function-the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.
  • Keywords
    approximation theory; image classification; pattern clustering; probability; text analysis; AIB; agglomerative information bottleneck; compact classification models; computational efficiency; distributional word clustering; fast approximate AIB algorithm; image classification benchmark; image recognition; probability distributions; reliable parameter estimation; text classification benchmark; word clustering algorithms; Approximation algorithms; Barium; Joints; Merging; Mutual information; Time complexity; Agglomerative Information Bottleneck; Image Recognition; Visual Codebook; Word clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1063-6919
  • Type

    conf

  • DOI
    10.1109/CVPR.2013.78
  • Filename
    6618922