Title :
Probabilistic clustering based on Langevin mixture
Author :
Amayri, Ola ; Bouguila, Nizar
Author_Institution :
Electr. & Comput. Eng. Dept., Concordia Univ., Montreal, QC, Canada
Abstract :
In this paper, we propose a statistical framework for clustering spherical data which are usually found in machine learning, data mining and computer vision applications. Our framework is based on finite Langevin mixture models which provide a very natural representation of normalized vectors in high dimensional spaces in which the data lie on unit hypersphere. Moreover, we developed minimum message length (MML) criterion for the selection of finite Langevin mixture components from which different probabilistic information divergence distances are then derived. Through empirical experiments, we demonstrate the merits of the proposed learning framework through challenging applications involving spam filtering using visual email content and email categorization.
Keywords :
pattern clustering; unsolicited e-mail; vectors; computer vision application; data mining; email categorization; finite Langevin mixture component; finite Langevin mixture model; machine learning; minimum message length criterion; normalized vector; probabilistic clustering; probabilistic information divergence distance; spam filtering; spherical data clustering; statistical framework; visual email content; Accuracy; Data models; Electronic mail; Machine learning; Probabilistic logic; Vectors;
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
DOI :
10.1109/icmla.2011.6174513