Title :
An evaluation of k-nearest neighbour imputation using Likert data
Author :
Jönsson, Per ; Wohlin, Claes
Author_Institution :
Sch. of Eng., Blekinge Inst. of Technol., Ronneby, Sweden
Abstract :
Studies in many different fields of research suffer from the problem of missing data. With missing data, statistical tests will lose power, results may be biased, or analysis may not be feasible at all. There are several ways to handle the problem, for example through imputation. With imputation, missing values are replaced with estimated values according to an imputation method or model. In the k-nearest neighbour (k-NN) method, a case is imputed using values from the k most similar cases. In this paper, we present an evaluation of the k-NN method using Likert data in a software engineering context. We simulate the method with different values of k and for different percentages of missing data. Our findings indicate that it is feasible to use the k-NN method with Likert data. We suggest that a suitable value of k is approximately the square root of the number of complete cases. We also show that by relaxing the method rules with respect to selecting neighbours, the ability of the method remains high for large amounts of missing data without affecting the quality of the imputation.
Keywords :
data analysis; software performance evaluation; software quality; statistical testing; Likert data; k-nearest neighbour imputation; software engineering; software quality; statistical tests; Artificial intelligence; Data analysis; Data engineering; Machine learning; Performance evaluation; Power engineering and energy; Psychology; Software architecture; Software engineering; Testing;
Conference_Titel :
Software Metrics, 2004. Proceedings. 10th International Symposium on
Print_ISBN :
0-7695-2129-0
DOI :
10.1109/METRIC.2004.1357895