• DocumentCode
    1799821
  • Title

    Privacy Aware Non-linear Support Vector Machine for Multi-source Big Data

  • Author

    Yunmei Lu ; Phoungphol, Piyaphol ; Yanqing Zhang

  • Author_Institution
    Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
  • fYear
    2014
  • fDate
    24-26 Sept. 2014
  • Firstpage
    783
  • Lastpage
    789
  • Abstract
    In order to build reliable prediction models and attain high classification accuracy, assembling datasets from multiple databases maintained by different sources (such as different hospitals) has become increasingly common. However, assembling these composite datasets involves the disclosure of individuals´ records, therefore many local owners are reluctant to share their data due to privacy concerns. This paper presents a framework for building a Privacy-Aware Non-linear Support Vector Machine (PAN-SVM) classifier using distributed data sources. The framework with three layers can do global classification based on distributed data sources and protect individuals´ records at the same time. At the bottom layer, k-means clustering is used to select landmarks that will be used by the medium layer after they are encrypted by a secure sum protocol. The medium layer employs Nystrom low-rank approximation and kernel matrix decomposition techniques to construct a global SVM classifier which is accelerated at the top layer by employing a cutting-plane technique. Simulation results on multiple datasets indicate that the new framework can solve the classification problem on distributed data sources effectively and efficiently, and protect the privacy of individuals´ data as well.
  • Keywords
    approximation theory; data privacy; matrix algebra; support vector machines; Nystrom low-rank approximation; PAN-SVM classifier; assembling datasets; cutting plane technique; distributed data sources; global SVM classifier; kernel matrix decomposition techniques; multiple databases; multisource big data; privacy aware nonlinear support vector machine; secure sum protocol; Accuracy; Data models; Data privacy; Distributed databases; Kernel; Support vector machines; Training; Cutting-plane Method; Distributed data-mining; Low-rank Approximation; Matrix Decomposition; Multi-source Data; Privacy preserving; SVM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/TrustCom.2014.103
  • Filename
    7011327