• DocumentCode
    3250982
  • Title

    De-anonymizing private data by matching statistics

  • Author

    Unnikrishnan, Jayakrishnan ; Naini, Farid M.

  • Author_Institution
    Sch. of Comput. & Commun. Sci., Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
  • fYear
    2013
  • fDate
    2-4 Oct. 2013
  • Firstpage
    1616
  • Lastpage
    1623
  • Abstract
    Recent research has illustrated privacy breaches that can be effected on an anonymized dataset by an attacker who has access to auxiliary information about the users. Most of these attack strategies rely on the uniqueness of specific aspects of the users´ data - e.g., observing a mobile user at just a few points on the time-location space are sufficient to uniquely identify him/her from an anonymized set of users. In this work, we consider de-anonymization attacks on anonymized summary statistics in the form of histograms. Such summary statistics are useful for many applications that do not need knowledge about exact user behavior. We consider an attacker who has access to an anonymized set of histograms of K users´ data and an independent set of data belonging to the same users. Modeling the users´ data as i.i.d., we study the composite hypothesis testing problem of identifying the correct matching between the anonymized histograms from the first set and the user data from the second. We propose a Generalized Likelihood Ratio Test as a solution to this problem and show that the solution can be identified using a minimum weight matching algorithm on an K × K complete bipartite weighted graph. We show that a variant of this solution is asymptotically optimal as the data lengths are increased.We apply the algorithm on mobility traces of over 1000 users on EPFL campus collected during two weeks and show that up to 70% of the users can be correctly matched. These results show that anonymized summary statistics of mobility traces themselves contain a significant amount of information that can be used to uniquely identify users by an attacker who has access to auxiliary information about the statistics.
  • Keywords
    data privacy; graph theory; mobile computing; statistical testing; EPFL campus; anonymized histogram set access; anonymized summary statistics; anonymized user data set; asymptotic optimality; auxiliary information access; complete bipartite weighted graph; composite hypothesis testing problem; data lengths; generalized likelihood ratio test; histograms; independent data set; minimum weight matching algorithm; mobile user; mobility traces; privacy breach; private data de-anonymization attack strategies; time-location space; user data modeling; Accuracy; Bipartite graph; Databases; Histograms; Testing; Trajectory; Wireless communication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on
  • Conference_Location
    Monticello, IL
  • Print_ISBN
    978-1-4799-3409-6
  • Type

    conf

  • DOI
    10.1109/Allerton.2013.6736722
  • Filename
    6736722