DocumentCode
12782
Title
On the Analytical Properties of High-Dimensional Randomization
Author
Aggarwal, Charu C.
Author_Institution
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
25
Issue
7
fYear
2013
fDate
Jul-13
Firstpage
1628
Lastpage
1642
Abstract
In this paper, we will provide the first comprehensive analysis of high-dimensional randomization. The goal is to examine the strengths and weaknesses of randomization and explore both the potential and the pitfalls of high-dimensional randomization. Our theoretical analysis results in a number of interesting and insightful conclusions. 1) The privacy effects of randomization reduce rapidly with increasing dimensionality. 2) The properties of the underlying data set can affect the anonymity level of the randomization method. For example, natural properties of real data sets such as clustering improve the effectiveness of randomization. On the other hand, variations in data density of nonempty data localities and outliers create privacy preservation challenges for the randomization method. 3) The use of a public information-sensitive attack method makes the choice of perturbing distribution more critical than previously thought. In particular, Gaussian perturbations are significantly more effective than uniformly distributed perturbations for the high dimensional case. These insights are very useful for future research and design of the randomization method. We use the insights gained from our analysis to discuss and suggest future research directions for improvements and extensions of the randomization method.
Keywords
Gaussian distribution; data privacy; perturbation techniques; public information systems; Gaussian perturbations; analytical properties; anonymity level randomization method; data density; data set; distributed perturbations; high-dimensional randomization comprehensive analysis; natural properties; nonempty data localities; privacy effects; privacy preservation challenges; public information-sensitive attack method; theoretical analysis; Aggregates; Context; Data privacy; Databases; Noise; Privacy; Privacy; high-dimensional data; randomization;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2012.98
Filename
6200272
Link To Document