DocumentCode :
2710513
Title :
Graph-Based Rare Category Detection
Author :
He, Jingrui ; Liu, Yan ; Lawrence, Richard
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
833
Lastpage :
838
Abstract :
Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.
Keywords :
data mining; graph theory; learning (artificial intelligence); pattern classification; pattern clustering; probability; GRADE graph-based rare category detection algorithm; GRADE-LI algorithm; data mining; global similarity matrix; machine learning; majority class; manifold ranking algorithm; minority class; pattern classification; probability density; unlabeled data set clustering; Astronomy; Clustering algorithms; Data mining; Helium; Image sampling; Intrusion detection; Machine learning; Manifolds; Object detection; Upper bound; graph; rare category detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3502-9
Type :
conf
DOI :
10.1109/ICDM.2008.122
Filename :
4781187
Link To Document :
بازگشت