An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data

Author

Jianglin Huang;Hongyi Sun;Yan-Fu Li;Min Xie

Author_Institution

Dept. of Syst. Eng. &

fYear

2015

Firstpage

37

Lastpage

42

Abstract

Software quality prediction is an important yet difficult problem in software project development and management. Historical datasets can be used to build models for software quality prediction. However, the missing data significantly affects the prediction ability of models in knowledge discovery. Instead of ignoring missing observations, we investigate and improve incomplete-case k-nearest neighbor based imputation. K-nearest neighbor imputation is widely applied but has rarely been improved to have the most appropriate parameter settings for each imputation. This work conducts imputation on four well-known software quality datasets to discover the impact of the new imputation method we proposed. We compare it with mean imputation and other commonly used versions of k-nearest neighbor imputation. The empirical results show that the proposed dynamic incomplete-case nearest neighbor imputation performs better when the missingness is completely at random or non-ignorable, regardless of the percentage of missing values.

Keywords

"Software quality","Nickel","Software engineering","Measurement","Estimation","Predictive models"

Publisher

ieee

Conference_Titel

Software Quality, Reliability and Security (QRS), 2015 IEEE International Conference on

Type

conf

DOI

10.1109/QRS.2015.16

Filename

7272912