• DocumentCode
    2729738
  • Title

    On Randomization, Public Information and the Curse of Dimensionality

  • Author

    Aggarwal, Charu C.

  • Author_Institution
    IBM T. J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Firstpage
    136
  • Lastpage
    145
  • Abstract
    A key method for privacy preserving data mining is that of randomization. Unlike k-anonymity, this technique does not include public information in the underlying assumptions. In this paper, we provide a first comprehensive analysis of the randomization method in the presence of public information. We define a quantification of the randomization method which we refer to as k-randomization of the data. The inclusion of public information in the theoretical analysis of the randomization method results in a number of interesting and insightful conclusions. These conclusions expose some vulnerabilities of the randomization method. We show that the randomization method is unable to effectively achieve privacy in the high dimensional case. We theoretically quantify the degree of randomization required to guarantee privacy as a function of the underlying data dimensionality. Furthermore, we show that the randomization method is susceptible to many natural properties of real data sets such as clusters or outliers. Finally, we show that the use of public information makes the choice of perturbing distribution very critical in a number of subtle ways. Our analysis shows that the inclusion of public information in the analysis makes the goal of privacy preservation more elusive than previously thought for the randomization method.
  • Keywords
    data mining; data privacy; database theory; randomised algorithms; data dimensionality; data mining; data privacy preservation; public information; randomization method; Aggregates; Algorithm design and analysis; Batteries; Couplings; Data mining; Data privacy; Databases; Government; Information analysis; Spectral analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0802-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2007.367859
  • Filename
    4221662