مرکز منطقه ای اطلاع رساني علوم و فناوري - On Density Based Transforms for Uncertain Data Mining

DocumentCode :

2731018

Title :

On Density Based Transforms for Uncertain Data Mining

Author :

Aggarwal, Charu C.

Author_Institution :

IBM TJ Watson Res. Center, Hawthorne, NY, USA

fYear :

2007

fDate :

15-20 April 2007

Firstpage :

866

Lastpage :

875

Abstract :

In spite of the great progress in the data mining field in recent years, the problem of missing and uncertain data has remained a great challenge for data mining algorithms. Many real data sets have missing attribute values or values which are only approximately measured or imputed. In some methodologies such as privacy preserving data mining, it is desirable to explicitly add perturbations to the data in order to mask sensitive values. If the underlying data is not of high quality, one cannot expect the corresponding algorithms to perform effectively. In many cases, it may be possible to obtain quantitative measures of the errors in different entries of the data. In this paper, we will show that this is very useful information for the data mining process, since it can be leveraged to improve the quality of the results. We discuss a new method for handling error-prone and missing data with the use of density based approaches to data mining. We discuss methods for constructing error-adjusted densities of data sets, and using these densities as intermediate representations in order to perform more accurate mining. We discuss the mathematical foundations behind the method and establish ways of extending it to very large scale data mining problems. As a concrete example of our technique, we show how to apply the intermediate density representation in order to accurately solve the classification problem. We show that the error-based method can be effectively and efficiently applied to very large data sets, and turns out to be very useful as a general approach to such problems.

Keywords :

data handling; data mining; data sets; density-based transforms; error-prone data handling; mathematical foundations; missing data handling; privacy preserving data mining; uncertain data mining; Concrete; Data mining; Data privacy; Demography; Feature extraction; Large-scale systems; Marketing and sales; Nearest neighbor searches; Statistical analysis; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on

Conference_Location :

Istanbul

Print_ISBN :

1-4244-0802-4

Type :

conf

DOI :

10.1109/ICDE.2007.367932

Filename :

4221735

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2731018