DocumentCode :
1742934
Title :
A probabilistic hierarchical clustering method for organising collections of text documents
Author :
Vinokourov, Alexei ; Girolami, Mark
Author_Institution :
Dept. of Comput. & Inf. Syst., Paisley Univ., UK
Volume :
2
fYear :
2000
fDate :
2000
Firstpage :
182
Abstract :
A generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been called symmetric and asymmetric models. For text data specifically both asymmetric and symmetric models based on multinomial and binomial distributions are most appropriate. An expectation maximisation parameter estimation method is provided for all of these models. An experimental comparison of the models is obtained for two extensive online document collections
Keywords :
binomial distribution; parameter estimation; pattern clustering; text analysis; unsupervised learning; asymmetric models; binomial distributions; expectation maximisation parameter estimation method; hierarchical probabilistic mixture methodology; large-scale sparse high-dimensional data collections; multinomial distributions; online document collections; probabilistic hierarchical clustering method; symmetric models; text documents; unsupervised hierarchical clustering; Clustering methods; Computational intelligence; Costs; Databases; Information retrieval; Information systems; Internet; Large-scale systems; Parameter estimation; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2000. Proceedings. 15th International Conference on
Conference_Location :
Barcelona
ISSN :
1051-4651
Print_ISBN :
0-7695-0750-6
Type :
conf
DOI :
10.1109/ICPR.2000.906043
Filename :
906043
Link To Document :
بازگشت