Title :
Handling Noisy Data using Attribute Selection and Smart Tokens
Author :
Tamilselvi, Jebamalar J. ; Saravanan, V.
Author_Institution :
Dept. of Comput. Applic., Karunya Univ., Coimbatore
fDate :
Aug. 29 2008-Sept. 2 2008
Abstract :
Data cleaning is a process of identifying or determining expected problem when integrating data from different sources or from a single source. There are so many problems can be occurred in the data warehouse while loading or integrating data. The main problem in data warehouse is noisy data. This noisy data error is due to the misuse of abbreviations, data entry mistakes, duplicate records and spelling errors. The proposed algorithm will be efficient in handling the noisy data by expanding abbreviation, removing unimportant characters and eliminating duplicates. The attribute selection algorithm is used for the attribute selection before the token formation. An attribute selection algorithm and token formation algorithm is used for data cleaning to reduce a complexity of data cleaning process and to clean data flexibly and effortlessly without any confusion. This research work uses smart token to increase the speed of the mining process and improve the quality of the data.
Keywords :
data integrity; data mining; data warehouses; attribute selection; data cleaning; data integration; data mining; data warehouse; noisy data handling; smart tokens; Cleaning; Computer applications; Computer science; Data mining; Data warehouses; Databases; Information resources; Information technology; Sorting; Data Cleaning; Data Quality; Data Warehousing; Smart Tokens;
Conference_Titel :
Computer Science and Information Technology, 2008. ICCSIT '08. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3308-7
DOI :
10.1109/ICCSIT.2008.62