Title :
SLOF: Identify Density-Based Local Outliers in Big Data
Author :
Haowen Guan;Qingzhong Li;Zhongmin Yan;Wei Wei
Author_Institution :
Sch. of Comput. Sci. &
Abstract :
With the rapid progress in data mining and outlier detection, outlier detection methods have been widely used in various domains. The density based LOF method is the commonly used outlier detection method. In big data, the size and dimensions of data is very large, and the data is sparse. Those features make the LOF not suitable for big data. According to the features of big data, we propose a novel SLOF method. We use vectors to denote the complex high dimensional objects in dataset. We compute the distances between objects based on the concept of vector similarity. We introduce the idea of feature bagging approach, to make the SLOF method robust and accurate. We compare the performance of SLOF, LOF and the PINN methods. The experimental results show that SLOF scores´ distribution is more stable, the recall rate and precision of SLOF is much better than LOF and PINN methods.
Keywords :
"Big data","Bagging","Euclidean distance","Data mining","Robustness","Computer science","Feature extraction"
Conference_Titel :
Web Information System and Application Conference (WISA), 2015 12th
Print_ISBN :
978-1-4673-9371-3
DOI :
10.1109/WISA.2015.40