Title :
Representative sequence selection in unsupervised anomaly detection using spectrum kernel with theoretical parameter setting
Author :
Skudlarek, Stefan Jan ; Yamamoto, Hirosuke
Author_Institution :
Grad. Sch. of Frontier Sci., Univ. of Tokyo, Chiba, Japan
Abstract :
Unsupervised anomaly detection is an important topic of data mining research, especially with respect to non-numerical sequence data. However, the majority of previous algorithms features empirical parameter selection. The contribution of this study is twofold: First, we show how the Akaike Information Criterion can be used to set the parameter of the spectrum kernel. Second, a distance-based algorithm for one-class unsupervised anomaly detection is presented. The algorithm uses the distance matrix of the data to select a sequence representative of the normal class by means of robust statistics. The proposed algorithm is applied to two kinds of sequence data, showing its suitability.
Keywords :
data mining; matrix algebra; security of data; statistical analysis; unsupervised learning; Akaike Information Criterion; data mining research; distance matrix; distance-based algorithm; empirical parameter selection; nonnumerical sequence data; one-class unsupervised anomaly detection; representative sequence selection; robust statistics; spectrum kernel; theoretical parameter setting; Robustness; Akaike Information Criterion; anomaly detection; sequence data; spectrum kernel; unsupervised;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
DOI :
10.1109/ICMLC.2010.5580497