DocumentCode
1844916
Title
Applying sensitivity analysis to missing data in classifiers
Author
Lei, Lei ; Wu, Naijun ; Liu, Peng
Author_Institution
Sch. of Inf. Manage. & Eng., Shanghai Univ. of Finance & Econ., China
Volume
2
fYear
2005
fDate
13-15 June 2005
Firstpage
1051
Abstract
Among all the technologies of data mining, predictive classification has a wide range of application. People do some prediction by building up classification models and hope to achieve high classification accuracy. However, there are always some data quality problems in the datasets, which will affect the accuracy of classification models. For example, missing data is a common problem in datasets. In this paper, we investigates the influence of missing data to classifiers. Firstly, basic knowledge about data quality and sensitivity analysis is introduced briefly. Then, the sensitivity of six representative classifiers to missing data is studied by sensitivity experiments. The results indicate that, in the datasets, when the proportion of missing data exceeds 20%, they do have a huge adverse impact on the classification accuracy of the model. Moreover, missing data have different effects on different datasets according to their characteristics. Among the six classifiers, the naive Bayesian classifier is the least sensitive to missing data.
Keywords
backpropagation; belief networks; data mining; decision trees; sensitivity analysis; classification accuracy; classification models; data mining; data quality problems; missing data; naive Bayesian classifier; predictive classification; sensitivity analysis; Classification algorithms; Data engineering; Data mining; Data warehouses; Databases; Delta modulation; Economic forecasting; Finance; Information management; Sensitivity analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on
Print_ISBN
0-7803-8971-9
Type
conf
DOI
10.1109/ICSSSM.2005.1500155
Filename
1500155
Link To Document