DocumentCode :
1041326
Title :
Simultaneous feature selection and clustering using mixture models
Author :
Law, Martin H C ; Figueiredo, Mário A T ; Jain, Anil K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
Volume :
26
Issue :
9
fYear :
2004
Firstpage :
1154
Lastpage :
1166
Abstract :
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Keywords :
Gaussian processes; maximum likelihood estimation; optimisation; pattern clustering; unsupervised learning; Gaussian processes; clustering algorithms; expectation maximization algorithm; feature clustering; feature selection; maximum likelihood estimation; minimum message length model; mixture models; mixture-based clustering; unsupervised learning; Clustering algorithms; Computer vision; Data analysis; Degradation; Helium; Image segmentation; Information retrieval; Partitioning algorithms; Supervised learning; Unsupervised learning; EM algorithm.; Index Terms- Feature selection; clustering; minimum message length; mixture models; unsupervised learning; Algorithms; Artificial Intelligence; Cluster Analysis; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Models, Biological; Models, Statistical; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2004.71
Filename :
1316850
Link To Document :
بازگشت