DocumentCode :
3029296
Title :
Software effort estimation with a generalized robust linear regression technique
Author :
Lavazza, L. ; Morasca, S.
Author_Institution :
Dipt. di Sci. Teoriche e Applicate, Univ. degli Studi dell´Insubria, Varese, Italy
fYear :
2012
fDate :
14-15 May 2012
Firstpage :
206
Lastpage :
215
Abstract :
Background. Outliers and corrupted data points may unduly bias software development effort estimation models. However, given the usually limited size of software engineering data sets, removing too many data points may seriously reduce the power of the statistical tests used and the likelihood of statistically significant result. Also, statistical techniques are typically based on assumptions that are either believed to be true a priori or, at best, checked via statistical tests, without ever achieving 100% certainty on their truthfulness. Estimation models based on less strict assumptions have broader applicability and lower risks of drawing unwarranted conclusions. Aim. We investigate the usefulness of Robust Regression when building effort estimation models, by varying the degree of robustness and, thus, the number of data points that are excluded from the data analysis as outliers. Method. We have used Least Quantile of Squares (LQS) Robust Regression, a generalization of the Least Median of Squares (LMS). LMS builds a regression line by minimizing the median squared residual. LQS minimizes the order statistic of square residuals corresponding to any specified quantile, and not just the median, which is the order statistic corresponding to the 50% quantile. We have extended a statistical significance test for univariate LQS regression models. We have also built a weighted model, obtained from statistically significant LQS models, where each LQS model contributes proportionally to the quantile used. Results. We have applied LQS Linear Regression to estimate development effort on four projects from the PROMISE data set and obtained valid and significant univariate models. Conclusions. LQS may provide a valid alternative to LMS and Ordinary Least Square regressions to build estimation models when (1) balancing the ne
Keywords :
data analysis; least squares approximations; regression analysis; software metrics; LMS; PROMISE data set; corrupted data points; data analysis; generalized robust linear regression technique; least median of squares; least quantile of squares; median squared residual minimization; ordinary least square regressions; outliers; robustness degree; software development effort estimation models; software engineering data sets; software metrics; square residual order statistic minimization; statistical techniques; univariate LQS regression models; weighted model; data analysis; defect prediction; effort prediction; outliers; robust regression; software metrics; statistical significance; weighted models;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Evaluation & Assessment in Software Engineering (EASE 2012), 16th International Conference on
Conference_Location :
Ciudad Real
Electronic_ISBN :
978-1-84919-541-6
Type :
conf
DOI :
10.1049/ic.2012.0027
Filename :
6272516
Link To Document :
بازگشت