Title :
Empirical case studies in attribute noise detection
Author :
Khoshgoftaar, Taghi M. ; Hulse, Jason Van
Author_Institution :
Dept. of Comput. Eng., Florida Atlantic Univ., Boca Raton, FL, USA
Abstract :
The problem of determining the noisiest attribute(s) from a set of domain-specific attributes is of practical importance to domain experts and the data mining community. Data noise is generally of two types: attribute noise and mislabeling errors (class noise). For a given domain-specific dataset, attributes that contain a significant amount of noise can have a detrimental impact on the success of a data mining initiative, e.g., reducing the predictive ability of a classifier in a supervised learning task. Techniques that provide information about the noise quality of an attribute are useful tools for a data mining practitioner when performing analysis on a dataset or scrutinizing the data collection processes. Our technique for detecting noisy attributes uses an algorithm that we recently proposed for the detection of instances with attribute noise. This paper presents case studies that confirm our recent work done on detecting noisy attributes and further validates that our technique is indeed able to detect attributes that contain noise.
Keywords :
data mining; database management systems; attribute noise; attribute noise detection; class noise; data collection process; data mining; domain experts; domain-specific attributes; domain-specific dataset; mislabeling errors; supervised learning task; Computer aided software engineering; Computer errors; Computer science; Data analysis; Data engineering; Data mining; Information analysis; Noise reduction; Performance analysis; Supervised learning;
Conference_Titel :
Information Reuse and Integration, Conf, 2005. IRI -2005 IEEE International Conference on.
Print_ISBN :
0-7803-9093-8
DOI :
10.1109/IRI-05.2005.1506475