DocumentCode :
1242514
Title :
Empirical Case Studies in Attribute Noise Detection
Author :
Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL
Volume :
39
Issue :
4
fYear :
2009
fDate :
7/1/2009 12:00:00 AM
Firstpage :
379
Lastpage :
388
Abstract :
The quality of data is an important issue in any domain-specific data mining and knowledge discovery initiative. The validity of solutions produced by data-driven algorithms can be diminished if the data being analyzed are of low quality. The quality of data is often realized in terms of data noise present in the given dataset and can include noisy attributes or labeling errors. Hence, tools for improving the quality of data are important to the data mining analyst. We present a comprehensive empirical investigation of our new and innovative technique for ranking attributes in a given dataset from most to least noisy. Upon identifying the noisy attributes, specific treatments can be applied depending on how the data are to be used. In a classification setting, for example, if the class label is determined to contain the most noise, processes to cleanse this important attribute may be undertaken. Independent variables or predictors that have a low correlation to the class attribute and appear noisy may be eliminated from the analysis. Several case studies using both real-world and synthetic datasets are presented in this study. The noise detection performance is evaluated by injecting noise into multiple attributes at different noise levels. The empirical results demonstrate conclusively that our technique provides a very accurate and useful ranking of noisy attributes in a given dataset.
Keywords :
data analysis; data mining; attribute noise detection; data quality; domain-specific data mining; knowledge discovery; Attribute noise; data cleaning; data quality; noise detection; pairwise attribute noise detection algorithm (PANDA);
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
Publisher :
ieee
ISSN :
1094-6977
Type :
jour
DOI :
10.1109/TSMCC.2009.2013815
Filename :
4815435
Link To Document :
بازگشت