DocumentCode :
1625102
Title :
A preprocessing of outlier using KERNEL PCA and factor scores in regression model
Author :
Oh, Kyung-Whan ; Jun, Sunghae ; Kim, Yong-Jun
Author_Institution :
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul, South Korea
fYear :
2009
Firstpage :
2132
Lastpage :
2135
Abstract :
Data analysis including outlier is more difficult to the analysis without outlier. The outlier has a chance to increase the misclassification rate and the variance of estimate in the supervised learning like classification and regression. Also the outlier becomes a cluster in the clustering as unsupervised learning. So we are hard to represent the clustering result. Because of the previous problems, it is removed generally for constructing model in data mining. But when the outlier has some information on given data, we must not remove it from training data set. In this paper, using kernel PCA (principal component analysis) and factor scores, we propose a preprocessing method to contain the outlier in the modeling. The outlier effect of given training data set is reduced by the values of kernel PCA and factor scores. We verify improved performance of our work by the experimental results using simulation data sets in regression model.
Keywords :
data analysis; data mining; estimation theory; pattern classification; pattern clustering; principal component analysis; regression analysis; unsupervised learning; clustering method; data analysis; data mining; factor score; kernel PCA; misclassification rate; outlier preprocessing method; principal component analysis; regression model; simulation data set; supervised learning; training data set; unsupervised learning; variance estimate; Computer science; Data mining; Intrusion detection; Kernel; Principal component analysis; Statistical analysis; Supervised learning; Testing; Training data; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on
Conference_Location :
Jeju Island
ISSN :
1098-7584
Print_ISBN :
978-1-4244-3596-8
Electronic_ISBN :
1098-7584
Type :
conf
DOI :
10.1109/FUZZY.2009.5277180
Filename :
5277180
Link To Document :
بازگشت