• DocumentCode
    68931
  • Title

    A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud

  • Author

    Xuyun Zhang ; Yang, L.T. ; Chang Liu ; Jinjun Chen

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • Volume
    25
  • Issue
    2
  • fYear
    2014
  • fDate
    Feb. 2014
  • Firstpage
    363
  • Lastpage
    373
  • Abstract
    A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.
  • Keywords
    cloud computing; data privacy; parallel processing; MapReduce; TDS approach; cloud computing; large-scale data set anonymization; privacy preservation; scalable two-phase top-down specialization approach; Algorithm design and analysis; Data privacy; Distributed algorithms; Distributed databases; Privacy; Scalability; Taxonomy; Algorithm design and analysis; Data anonymization; Data privacy; Distributed algorithms; Distributed databases; MapReduce; Privacy; Scalability; Taxonomy; cloud; privacy preservation; top-down specialization;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.48
  • Filename
    6470603