Addressing Missing Attributes during Data Mining Using Frequent Itemsets and Rough Set Based Predictions

Author

Li, Jiye ; Cercone, Nick ; Cohen, Robin

Author_Institution

York Univ., Toronto

fYear

2007

fDate

2-4 Nov. 2007

Firstpage

294

Lastpage

294

Abstract

In this paper, we present an improved method for predicting missing attribute values in data sets. We make use of frequent itemsets, generated from the association rules algorithm, displaying the correlations between different items in a set of transactions. In particular, we consider a database as a set of transactions and each data instance as an itemset. Then frequent itemsets can be used as a knowledge base to predict missing attribute values. Our approach integrates the RSFit method based on rough sets theory that produces faster predictions by considering similarities of attribute value pairs, but only for those attributes contained in the core or reduct of the data set. Using empirical studies on UCI and other real world data sets, we demonstrate a significant increase in prediction accuracy obtained from our new integrated approach, referred to as ItemRSFit.

Keywords

data mining; rough set theory; ItemRSFit; RSFit method; association rules algorithm; data mining; frequent itemsets; knowledge base; missing attribute value prediction; missing attributes; rough set based predictions; rough sets theory; Accuracy; Association rules; Data mining; Data preprocessing; Data privacy; Design for experiments; Itemsets; Rough sets; Testing; Transaction databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Granular Computing, 2007. GRC 2007. IEEE International Conference on

Conference_Location

Fremont, CA

Print_ISBN

978-0-7695-3032-1

Type

conf

DOI

10.1109/GrC.2007.144

Filename

4403113