Title :
Polishing blemishes: issues in data correction
Author_Institution :
Inst. for Human & Machine Cognition, Pensacola, FL, USA
Abstract :
Data quality is crucial to any data analysis task. Many imperfection-handling techniques avoid overfitting or simply remove offending portions of the data. Polishing identifies blemishes in the data and makes corrections to retain and recover as much information as possible. When using information collected from channels susceptible to disturbances, data quality is a concern-especially when the primary objective is to assimilate and understand the data. Imperfections can arise from many sources, including transmission and bandwidth constraints, faults in sensor devices, irregularities in sampling, and transcription errors. An intuitive application that exemplifies handling data imperfections is the spell-checker. Developing such a spell-checker would require novel techniques for repairing data imperfections. We are exploring such techniques using a data correction method called polishing. Here, we compare polishing to two alternative approaches to handling data imperfections, focusing on how to evaluate and validate data correction mechanisms.
Keywords :
data analysis; data handling; data integrity; data mining; bandwidth constraint; data analysis task; data correction method; data imperfection-handling technique; data quality; spell-checker; transcription error; Bandwidth; Cognition; Data analysis; Data mining; Filtering; Humans; Intelligent sensors; Machine learning algorithms; Noise robustness; Sampling methods;
Journal_Title :
Intelligent Systems, IEEE
DOI :
10.1109/MIS.2004.1274909