DocumentCode
2754643
Title
Noise Correction using Bayesian Multiple Imputation
Author
Hulse, Jason Van ; Khoshgoftaar, Taghi M. ; Seiffert, Chris ; Zhao, Lili
Author_Institution
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL
fYear
2006
fDate
16-18 Sept. 2006
Firstpage
478
Lastpage
483
Abstract
This work presents a novel procedure to detect and correct noise in a continuous dependent variable. The presence of noise in a dataset represents a significant challenge to data mining algorithms, as incorrect values in both the independent and dependent variables can severely corrupt the results of even robust learners. The problem of noise is especially severe when it is located in the dependent variable. In the worst case, severe noise in one of the independent variables can be handled by eliminating that attribute from the dataset, provided that the practitioner knows that noise is present. In the setting of supervised learning, the dependent variable is the most critical attribute in the dataset and therefore cannot be eliminated even if significant noise is present. Noise handling procedures in relation to the dependent variable are therefore absolutely critical to the success of a supervised learning initiative. In contrast to a binary dependent variable or class, noise in a continuous dependent variable presents many additional difficulties. Our procedure to detect and correct noise in a continuous dependent variable uses Bayesian multiple imputation, which was initially developed to combat the problem of missing data. Our case study considers a real-world software measurement dataset called CCCS, which has a numeric dependent variable with inherent noise. The results of our experiments show very encouraging results and clearly demonstrate the utility of our procedure
Keywords
belief networks; data mining; learning (artificial intelligence); Bayesian multiple imputation; command-control-communication system; data mining; noise correction; software measurement dataset; supervised learning; Bayesian methods; Computer science; Costs; Data mining; Databases; Laboratories; Noise robustness; Software engineering; Software measurement; Supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration, 2006 IEEE International Conference on
Conference_Location
Waikoloa Village, HI
Print_ISBN
0-7803-9788-6
Type
conf
DOI
10.1109/IRI.2006.252461
Filename
4018538
Link To Document