DocumentCode :
827216
Title :
On the use of conceptual reconstruction for mining massively incomplete data sets
Author :
Parthasarathy, Srinivasan ; Aggarwal, Charu C.
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
Volume :
15
Issue :
6
fYear :
2003
Firstpage :
1512
Lastpage :
1521
Abstract :
Incomplete data sets have become almost ubiquitous in a wide variety of application domains. Common examples can be found in climate and image data sets, sensor data sets, and medical data sets. The incompleteness in these data sets may arise from a number of factors: In some cases, it may simply be a reflection of certain measurements not being available at the time, in others, the information may be lost due to partial system failure, or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of conceptual reconstruction in which we create effective conceptual representations on which the data mining algorithms can be directly applied. The attraction behind the idea of conceptual reconstruction is to use the correlation structure of the data in order to express it in terms of concepts rather than the original dimensions. As a result, the reconstruction procedure estimates only those conceptual aspects of the data which can be mined from the incomplete data set, rather than force errors created by extrapolation. We demonstrate the effectiveness of the approach on a variety of real data sets.
Keywords :
data mining; climate data sets; conceptual reconstruction; correlation structure; data mining algorithms; image data sets; massively incomplete data set mining; medical data sets; sensor data sets; Application software; Biomedical imaging; Computer Society; Data mining; Data privacy; Extrapolation; Image reconstruction; Image sensors; Loss measurement; Reflection;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2003.1245289
Filename :
1245289
Link To Document :
بازگشت