Title :
Regression diagnostics in large and high dimensional data
Author :
Nurunnabi, A.A.M. ; Nasser, Mohammed
Author_Institution :
Sch. of Bus., Uttara Univ., Dhaka
Abstract :
ldquoLearning methodsrdquo play a key role in the fields of statistics, data mining, and artificial intelligence, intersecting with areas of engineering and other disciplines. These methods for analyzing and modeling data come in two flavors: supervised and unsupervised learning. Regression analysis and classification are two well known supervised learning techniques. To get an effective model from regression analysis it is necessary to check and preprocess the data set in astronomy, bio-informatics, image analysis, computer vision etc, especially when the data sets are large and high dimensional. In these industries large or fat data appear with unusual observations (outliers) very naturally. Checking raw data for outliers in regression is regression diagnostics. Most of the popular diagnostic methods are not good enough for large and high dimensional data. The aim of this paper is to provide a new measure for identifying influential observations in linear regression for large high dimensional data.
Keywords :
data analysis; learning (artificial intelligence); pattern classification; regression analysis; high dimensional data modeling; large dimensional data; outlier detection; regression analysis; regression classification; regression diagnostics; unsupervised learning; Artificial intelligence; Astronomy; Data analysis; Data engineering; Data mining; Image analysis; Regression analysis; Statistics; Supervised learning; Unsupervised learning; Data mining; high dimensional data; influential observation; outlier; regression diagnostics; supervised learning;
Conference_Titel :
Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
Conference_Location :
Khulna
Print_ISBN :
978-1-4244-2135-0
Electronic_ISBN :
978-1-4244-2136-7
DOI :
10.1109/ICCITECHN.2008.4802969