DocumentCode :
2412034
Title :
Measuring Disclosure Risk for Multimethod Synthetic Data Generation
Author :
Larsen, Michael D. ; Huckett, Jennifer C.
Author_Institution :
Dept. of Stat., George Washington Univ., Washington, DC, USA
fYear :
2010
fDate :
20-22 Aug. 2010
Firstpage :
808
Lastpage :
815
Abstract :
Government agencies must simultaneously maintain confidentiality of individual records and disseminate useful microdata. We propose a method to create synthetic data that combines quantile regression, hot deck imputation, and rank swapping. The result from implementation of the proposed procedure is a releasable data set containing original values for a few key variables, synthetic quantile regression predictions for several variables, and imputed and perturbed values for remaining variables. To measure the disclosure risk in the resulting synthetic data set, we extend existing probabilistic risk measures that aim to imitate an intruder attempting to match a record in the released data with information previously available on a target respondent.
Keywords :
government data processing; regression analysis; risk analysis; security of data; disclosure risk measurement; government agencies; hot deck imputation; multimethod synthetic data generation; probabilistic risk measures; rank swapping; synthetic quantile regression; Biological system modeling; Computational modeling; Data models; Equations; Joints; Mathematical model; Predictive models; hot deck imputation; quantile regression; rank swapping; statistical disclosure limitation; synthetic data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-8439-3
Electronic_ISBN :
978-0-7695-4211-9
Type :
conf
DOI :
10.1109/SocialCom.2010.123
Filename :
5591463
Link To Document :
بازگشت