• DocumentCode
    1844916
  • Title

    Applying sensitivity analysis to missing data in classifiers

  • Author

    Lei, Lei ; Wu, Naijun ; Liu, Peng

  • Author_Institution
    Sch. of Inf. Manage. & Eng., Shanghai Univ. of Finance & Econ., China
  • Volume
    2
  • fYear
    2005
  • fDate
    13-15 June 2005
  • Firstpage
    1051
  • Abstract
    Among all the technologies of data mining, predictive classification has a wide range of application. People do some prediction by building up classification models and hope to achieve high classification accuracy. However, there are always some data quality problems in the datasets, which will affect the accuracy of classification models. For example, missing data is a common problem in datasets. In this paper, we investigates the influence of missing data to classifiers. Firstly, basic knowledge about data quality and sensitivity analysis is introduced briefly. Then, the sensitivity of six representative classifiers to missing data is studied by sensitivity experiments. The results indicate that, in the datasets, when the proportion of missing data exceeds 20%, they do have a huge adverse impact on the classification accuracy of the model. Moreover, missing data have different effects on different datasets according to their characteristics. Among the six classifiers, the naive Bayesian classifier is the least sensitive to missing data.
  • Keywords
    backpropagation; belief networks; data mining; decision trees; sensitivity analysis; classification accuracy; classification models; data mining; data quality problems; missing data; naive Bayesian classifier; predictive classification; sensitivity analysis; Classification algorithms; Data engineering; Data mining; Data warehouses; Databases; Delta modulation; Economic forecasting; Finance; Information management; Sensitivity analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on
  • Print_ISBN
    0-7803-8971-9
  • Type

    conf

  • DOI
    10.1109/ICSSSM.2005.1500155
  • Filename
    1500155