DocumentCode :
2750726
Title :
Outlier Detection by Regression Diagnostics in Large Data
Author :
Nurunnabi, A.A.M. ; Nasser, Mohammed
Author_Institution :
Sch. of Bus., Uttara Univ., Dhaka, Bangladesh
fYear :
2009
fDate :
3-5 April 2009
Firstpage :
246
Lastpage :
250
Abstract :
Regression analysis is a well known supervised learning technique. To estimate and justify an effective model from regression analysis it is necessary to check and preprocess the data set. Without outliers (noise) it is impossible to get a real data. Areas in bio-informatics, astronomy, image analysis, computer vision etc, large or fat data appear with unusual observations (outliers) very naturally. In these industries robust regression are commonly used in model building process. But robust regression methods are not good enough in large and/or high dimensional data. Checking raw data for outliers in regression is regression diagnostics. Robust regression and regression diagnostics are two complementary ideas and any one is not enough for studying a contaminated data. Most of the popular diagnostic methods are not sufficient for large data because of masking and swamping. In this article, both of the above ideas are shortly discussed and we show a new measure can effectively identify outliers (influential observations) in linear regression for large data.
Keywords :
data analysis; learning (artificial intelligence); regression analysis; contaminated data; outlier detection; regression analysis; regression diagnostics; robust regression; supervised learning; Astronomy; Business communication; Data mining; Learning systems; Machine learning; Neural networks; Noise robustness; Parameter estimation; Regression analysis; Supervised learning; influential observation; learning method; outlier; regression diagnostics; robust regression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Computer and Communication, 2009. ICFCC 2009. International Conference on
Conference_Location :
Kuala Lumpar
Print_ISBN :
978-0-7695-3591-3
Type :
conf
DOI :
10.1109/ICFCC.2009.60
Filename :
5189782
Link To Document :
بازگشت