DocumentCode
3352144
Title
A technique for the quantitative measure of data cleanliness
Author
Wakchaure, Abhijit ; Eaglin, Ronald ; Motlagh, Bahman
Author_Institution
Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL
fYear
2008
fDate
21-24 Sept. 2008
Firstpage
1258
Lastpage
1263
Abstract
With the amount of data that is collected, viewed, processed, and stored today, techniques for the analysis of the accuracy of data are extremely important. Since we cannot improve what we cannot measure, the need for a tangible quantitative measure of data quality is a necessity. This paper focuses on a data-cleanliness algorithm, which makes use of the dasiaLevenshtein distancepsila, to measure the data quality for a criminal records database. Actual law enforcement name records were used for this research. The results help us arrive at the extent of dirtiness in the data, and also highlight the different types of dirty data. We then go on to show how measuring the data quality not only helps in setting up guidelines for the data clean-up process, but also can be used as a metric for cross-comparing like databases.
Keywords
data analysis; data mining; Levenshtein distance; criminal records database; data accuracy; data cleanliness; data quality; Bismuth; Computer science; Costs; Data engineering; Data mining; Data warehouses; Databases; Electric variables measurement; Guidelines; Law enforcement; data cleanliness; data quality; dirty data;
fLanguage
English
Publisher
ieee
Conference_Titel
Cybernetics and Intelligent Systems, 2008 IEEE Conference on
Conference_Location
Chengdu
Print_ISBN
978-1-4244-1673-8
Electronic_ISBN
978-1-4244-1674-5
Type
conf
DOI
10.1109/ICCIS.2008.4670930
Filename
4670930
Link To Document