DocumentCode
3658034
Title
An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data
Author
Jianglin Huang;Hongyi Sun;Yan-Fu Li;Min Xie
Author_Institution
Dept. of Syst. Eng. &
fYear
2015
Firstpage
37
Lastpage
42
Abstract
Software quality prediction is an important yet difficult problem in software project development and management. Historical datasets can be used to build models for software quality prediction. However, the missing data significantly affects the prediction ability of models in knowledge discovery. Instead of ignoring missing observations, we investigate and improve incomplete-case k-nearest neighbor based imputation. K-nearest neighbor imputation is widely applied but has rarely been improved to have the most appropriate parameter settings for each imputation. This work conducts imputation on four well-known software quality datasets to discover the impact of the new imputation method we proposed. We compare it with mean imputation and other commonly used versions of k-nearest neighbor imputation. The empirical results show that the proposed dynamic incomplete-case nearest neighbor imputation performs better when the missingness is completely at random or non-ignorable, regardless of the percentage of missing values.
Keywords
"Software quality","Nickel","Software engineering","Measurement","Estimation","Predictive models"
Publisher
ieee
Conference_Titel
Software Quality, Reliability and Security (QRS), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/QRS.2015.16
Filename
7272912
Link To Document