Title of article :
Preserving data utility via BART
Author/Authors :
Wang، نويسنده , , Xinlei and Karr، نويسنده , , Alan F.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to alter data without sacrificing data utility. In this paper, we propose a new approach to mitigate this difficulty, which entails using Bayesian additive regression trees (BART), in connection with existing methods for statistical disclosure limitation, to help preserve data utility while meeting confidentiality requirements. We illustrate the performance of our method through both simulation and a data example. The method works well when the targeted relationship underlying the original data is not weak, and the performance appears to be robust to the intensity of alteration.
Keywords :
Utility , Bayesian additive regression trees , Confidentiality , Disclosure limitation , MCMC , Swapping , variable selection
Journal title :
Journal of Statistical Planning and Inference
Journal title :
Journal of Statistical Planning and Inference