• DocumentCode
    47843
  • Title

    Differentially Private Frequent Itemset Mining via Transaction Splitting

  • Author

    Sen Su ; Shengzhi Xu ; Xiang Cheng ; Zhengyi Li ; Fangchun Yang

  • Author_Institution
    State Key Lab. of Networking & Switching Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
  • Volume
    27
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1875
  • Lastpage
    1891
  • Abstract
    Recently, there has been a growing interest in designing differentially private data mining algorithms. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, we propose a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth algorithm consists of a preprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method to estimate the actual support of itemsets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, we show that our PFP-growth algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFP-growth algorithm substantially outperforms the state-of-the-art techniques.
  • Keywords
    data mining; data privacy; data reduction; ε-differentially private; FP-growth algorithm; PFP-growth algorithm; data utility; differentially private FIM algorithm; differentially private data mining algorithms; differentially private frequent itemset mining; downward closure property; dynamic reduction method; formal privacy analysis; information loss; privacy degree; run-time estimation method; smart splitting method; time efficiency; transaction splitting; Algorithm design and analysis; Data privacy; Itemsets; Noise; Privacy; Sensitivity; Frequent itemset mining; differential privacy; transaction splitting;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2399310
  • Filename
    7029616