• DocumentCode
    2509547
  • Title

    A MapReduce Based Approach of Scalable Multidimensional Anonymization for Big Data Privacy Preservation on Cloud

  • Author

    Xuyun Zhang ; Chi Yang ; Nepal, Surya ; Chang Liu ; Wanchun Dou ; Jinjun Chen

  • Author_Institution
    Fac. of Eng. & IT, Univ. of Technol. Sydney, Sydney, NSW, Australia
  • fYear
    2013
  • fDate
    Sept. 30 2013-Oct. 2 2013
  • Firstpage
    105
  • Lastpage
    112
  • Abstract
    The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using Map Reduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.
  • Keywords
    Big Data; cloud computing; data protection; MapReduce; big data privacy preservation; cloud computing; cost-effectiveness; data anonymization; histogram technique; median of medians technique; privacy protection; recursion granularity; scalable median-finding algorithm; scalable multidimensional anonymization approach; Cloud computing; Data handling; Data privacy; Data storage systems; Information management; Privacy; Scalability; MapReduce; big data; cloud computing; multidimensional anonymization; privacy preservation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Green Computing (CGC), 2013 Third International Conference on
  • Conference_Location
    Karlsruhe
  • Type

    conf

  • DOI
    10.1109/CGC.2013.24
  • Filename
    6686016