DocumentCode :
2207919
Title :
Exploiting Local Data Uncertainty to Boost Global Outlier Detection
Author :
Liu, Bo ; Yin, Jie ; Xiao, Yanshan ; Cao, Longbing ; Yu, Philip S.
Author_Institution :
Fac. of Eng. & IT, QCIS Univ. of Technol., Sydney, NSW, Australia
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
304
Lastpage :
313
Abstract :
This paper presents a novel hybrid approach to outlier detection by incorporating local data uncertainty into the construction of a global classifier. To deal with local data uncertainty, we introduce a confidence value to each data example in the training data, which measures the strength of the corresponding class label. Our proposed method works in two steps. Firstly, we generate a pseudo training dataset by computing a confidence value of each input example on its class label. We present two different mechanisms: kernel k-means clustering algorithm and kernel LOF-based algorithm, to compute the confidence values based on the local data behavior. Secondly, we construct a global classifier for outlier detection by generalizing the SVDD-based learning framework to incorporate both positive and negative examples as well as their associated confidence values. By integrating local and global outlier detection, our proposed method explicitly handles the uncertainty of the input data and enhances the ability of SVDD in reducing the sensitivity to noise. Extensive experiments on real life datasets demonstrate that our proposed method can achieve a better tradeoff between detection rate and false alarm rate as compared to four state-of-the-art outlier detection algorithms.
Keywords :
data description; learning (artificial intelligence); pattern clustering; probability; support vector machines; uncertainty handling; SVDD based learning; boost global outlier detection; kernel LOF based algorithm; kernel k- means clustering; local data uncertainty; pseudo training dataset; support vector data description; Data uncertainty; Outlier detection; SVDD;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.10
Filename :
5693984
Link To Document :
بازگشت