Author_Institution :
Concordia Inst. for Inf. Syst. Eng., Concordia Univ., Montreal, QC, Canada
Abstract :
In this paper, we consider the problem of constructing accurate and flexible statistical representations for count data, which we often confront in many areas such as data mining, computer vision, and information retrieval. In particular, we analyze and compare several generative approaches widely used for count data clustering, namely multinomial, multinomial Dirichlet, and multinomial generalized Dirichlet mixture models. Moreover, we propose a clustering approach via a mixture model based on a composition of the Liouville family of distributions, from which we select the Beta-Liouville distribution, and the multinomial. The novel proposed model, which we call multinomial Beta-Liouville mixture, is optimized by deterministic annealing expectation-maximization and minimum description length, and strives to achieve a high accuracy of count data clustering and model selection. An important feature of the multinomial Beta-Liouville mixture is that it has fewer parameters than the recently proposed multinomial generalized Dirichlet mixture. The performance evaluation is conducted through a set of extensive empirical experiments, which concern text and image texture modeling and classification and shape modeling, and highlights the merits of the proposed models and approaches.
Keywords :
Liouville equation; expectation-maximisation algorithm; feature extraction; image texture; pattern clustering; solid modelling; text analysis; Beta Liouville distribution; count data modeling; data classification; data clustering; expectation maximization approach; finite distribution mixture; image classification; image texture modeling; multinomial generalized Dirichlet mixture; shape modeling; Annealing; Computational modeling; Data mining; Data models; Equations; Shape; Count data; Dirichlet; Fisher kernel; Liouville; deterministic annealing expectation-maximization; finite mixture models; generalized Dirichlet; model selection; multinomial; shape modeling; support vector machine; text categorization; texture classification; Algorithms; Artificial Intelligence; Automatic Data Processing; Computer Simulation; Data Mining; Humans; Mathematical Concepts; Models, Theoretical; Neural Networks (Computer); Pattern Recognition, Automated;