DocumentCode
1742934
Title
A probabilistic hierarchical clustering method for organising collections of text documents
Author
Vinokourov, Alexei ; Girolami, Mark
Author_Institution
Dept. of Comput. & Inf. Syst., Paisley Univ., UK
Volume
2
fYear
2000
fDate
2000
Firstpage
182
Abstract
A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections
Keywords
binomial distribution; parameter estimation; pattern clustering; text analysis; unsupervised learning; asymmetric models; binomial distributions; expectation maximisation parameter estimation method; hierarchical probabilistic mixture methodology; large-scale sparse high-dimensional data collections; multinomial distributions; online document collections; probabilistic hierarchical clustering method; symmetric models; text documents; unsupervised hierarchical clustering; Clustering methods; Computational intelligence; Costs; Databases; Information retrieval; Information systems; Internet; Large-scale systems; Parameter estimation; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2000. Proceedings. 15th International Conference on
Conference_Location
Barcelona
ISSN
1051-4651
Print_ISBN
0-7695-0750-6
Type
conf
DOI
10.1109/ICPR.2000.906043
Filename
906043
Link To Document