DocumentCode :
2693566
Title :
A cluster based approach to robust regression and outlier detection
Author :
Maiywan, S. ; Kashyap, R.L.
Author_Institution :
PLD-BU, Component R&D, Intel Corp., Folsom, CA, USA
Volume :
3
fYear :
1994
fDate :
2-5 Oct 1994
Firstpage :
2561
Abstract :
We consider the simultaneous estimation of the inlier set and the regression parameter using the contaminated data set, A, which has many members obey the linear model, but some do not. The single link clustering algorithm is used to partition the data set into inlier and outlier subsets. The subset having the least mean sum of squares of residuals is chosen to be the best subset among all subsets of A for a fixed link distance, λ. By assuming that the outliers also obey the same type of density as the inliers but different parameters, we develop a negative log likelihood expression for all the observations in terms of λ. By minimizing the negative log likelihood function with respect to λ we get the required λ* and the regression parameters. We apply the theory developed here to two data sets, and show our method correctly estimates the regression parameters and the inlier set for the given data set
Keywords :
least mean squares methods; parameter estimation; pattern recognition; statistical analysis; cluster based approach; contaminated data set; fixed link distance; inlier set; link clustering algorithm; negative log likelihood expression; outlier detection; robust regression; simultaneous estimation; Clustering algorithms; Gaussian distribution; Least squares approximation; Linear regression; Parameter estimation; Partitioning algorithms; Robustness; Tin;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 1994. Humans, Information and Technology., 1994 IEEE International Conference on
Conference_Location :
San Antonio, TX
Print_ISBN :
0-7803-2129-4
Type :
conf
DOI :
10.1109/ICSMC.1994.400256
Filename :
400256
Link To Document :
بازگشت