Exploring re-identification risks in public domains

Author

Ramachandran, Aditi ; Singh, Lisa ; Porter, Edward ; Nagle, Frank

Author_Institution

Georgetown Univ., Washington, DC, USA

fYear

2012

fDate

16-18 July 2012

Firstpage

35

Lastpage

42

Abstract

While re-identification of sensitive data has been studied extensively, with the emergence of online social networks and the popularity of digital communications, the ability to use public data for re-identification has increased. This work begins by presenting two different cases studies for sensitive data re-identification. We conclude that targeted re-identification using traditional variables is not only possible, but fairly straightforward given the large amount of public data available. However, our first case study also indicates that large-scale re-identification is less likely. We then consider methods for agencies such as the Census Bureau to identify variables that cause individuals to be vulnerable without testing all combinations of variables. We show the effectiveness of different strategies on a Census Bureau data set and on a synthetic data set.

Keywords

security of data; social networking (online); census bureau data set; data reidentification risk; digital communication; large-scale reidentification; online social network; public data; public domain; sensitive data reidentification; Accuracy; Data privacy; Databases; Facebook; Sociology; Twitter;

fLanguage

English

Publisher

ieee

Conference_Titel

Privacy, Security and Trust (PST), 2012 Tenth Annual International Conference on

Conference_Location

Paris

Print_ISBN

978-1-4673-2323-9

Electronic_ISBN

978-1-4673-2325-3

Type

conf

DOI

10.1109/PST.2012.6297917

Filename

6297917