DocumentCode :
3165153
Title :
Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection
Author :
Ando, Shin
Author_Institution :
Yokohama Nat. Univ., Yokohama
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
13
Lastpage :
22
Abstract :
Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.
Keywords :
learning (artificial intelligence); object detection; pattern classification; information bottleneck formalization; information theoretic analysis; machine learning; minority detection; needles clustering; one-class clustering; scalable minimization algorithm; simulated datasets; text classification; Clustering algorithms; Cost function; Data mining; Information analysis; Machine learning; Machine learning algorithms; Needles; Object detection; Rate distortion theory; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3018-5
Type :
conf
DOI :
10.1109/ICDM.2007.53
Filename :
4470225
Link To Document :
بازگشت