• DocumentCode
    2754643
  • Title

    Noise Correction using Bayesian Multiple Imputation

  • Author

    Hulse, Jason Van ; Khoshgoftaar, Taghi M. ; Seiffert, Chris ; Zhao, Lili

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL
  • fYear
    2006
  • fDate
    16-18 Sept. 2006
  • Firstpage
    478
  • Lastpage
    483
  • Abstract
    This work presents a novel procedure to detect and correct noise in a continuous dependent variable. The presence of noise in a dataset represents a significant challenge to data mining algorithms, as incorrect values in both the independent and dependent variables can severely corrupt the results of even robust learners. The problem of noise is especially severe when it is located in the dependent variable. In the worst case, severe noise in one of the independent variables can be handled by eliminating that attribute from the dataset, provided that the practitioner knows that noise is present. In the setting of supervised learning, the dependent variable is the most critical attribute in the dataset and therefore cannot be eliminated even if significant noise is present. Noise handling procedures in relation to the dependent variable are therefore absolutely critical to the success of a supervised learning initiative. In contrast to a binary dependent variable or class, noise in a continuous dependent variable presents many additional difficulties. Our procedure to detect and correct noise in a continuous dependent variable uses Bayesian multiple imputation, which was initially developed to combat the problem of missing data. Our case study considers a real-world software measurement dataset called CCCS, which has a numeric dependent variable with inherent noise. The results of our experiments show very encouraging results and clearly demonstrate the utility of our procedure
  • Keywords
    belief networks; data mining; learning (artificial intelligence); Bayesian multiple imputation; command-control-communication system; data mining; noise correction; software measurement dataset; supervised learning; Bayesian methods; Computer science; Costs; Data mining; Databases; Laboratories; Noise robustness; Software engineering; Software measurement; Supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2006 IEEE International Conference on
  • Conference_Location
    Waikoloa Village, HI
  • Print_ISBN
    0-7803-9788-6
  • Type

    conf

  • DOI
    10.1109/IRI.2006.252461
  • Filename
    4018538