• DocumentCode
    1742934
  • Title

    A probabilistic hierarchical clustering method for organising collections of text documents

  • Author

    Vinokourov, Alexei ; Girolami, Mark

  • Author_Institution
    Dept. of Comput. & Inf. Syst., Paisley Univ., UK
  • Volume
    2
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    182
  • Abstract
    A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections
  • Keywords
    binomial distribution; parameter estimation; pattern clustering; text analysis; unsupervised learning; asymmetric models; binomial distributions; expectation maximisation parameter estimation method; hierarchical probabilistic mixture methodology; large-scale sparse high-dimensional data collections; multinomial distributions; online document collections; probabilistic hierarchical clustering method; symmetric models; text documents; unsupervised hierarchical clustering; Clustering methods; Computational intelligence; Costs; Databases; Information retrieval; Information systems; Internet; Large-scale systems; Parameter estimation; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2000. Proceedings. 15th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-0750-6
  • Type

    conf

  • DOI
    10.1109/ICPR.2000.906043
  • Filename
    906043