DocumentCode :
1953891
Title :
Data mining: where do we start?
Author :
De Veaux, Richard D.
Author_Institution :
Dept. of Math. & Stat., Williams Coll., Williamstown, MA, USA
fYear :
2003
fDate :
16-19 June 2003
Firstpage :
19
Abstract :
Summary form only given. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (D. Hand (2001). Much exploratory data analysis (EDA) and inferential statistics concern the same problems. Part of the challenge of data mining is the sheer size of the data sets and/or the number of possible predictor variables. With 500 potential predictor variables, just summarizing them and graphing them to start the process is impossible. Instead, in data mining, we may start the process by creating a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it is simply using models as a new exploratory tool. We take a brief tour of the current state of data mining algorithms and using several case studies explain how EDM can be easily used to narrow the search for a useful predictive model and to increase the chances of producing useful meaningful results.
Keywords :
data analysis; data mining; data models; very large databases; EDA; EDM; data mining; exploratory data analysis; exploratory data modeling; inferential statistics; large data sets; model selection; observational data sets; predictor variables; Data analysis; Data mining; Educational institutions; Electronic design automation and methodology; Information technology; Mathematics; Predictive models; Statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces, 2003. ITI 2003. Proceedings of the 25th International Conference on
ISSN :
1330-1012
Print_ISBN :
953-96769-6-7
Type :
conf
DOI :
10.1109/ITI.2003.1225315
Filename :
1225315
Link To Document :
بازگشت