Title :
Privacy Aware Non-linear Support Vector Machine for Multi-source Big Data
Author :
Yunmei Lu ; Phoungphol, Piyaphol ; Yanqing Zhang
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
Abstract :
In order to build reliable prediction models and attain high classification accuracy, assembling datasets from multiple databases maintained by different sources (such as different hospitals) has become increasingly common. However, assembling these composite datasets involves the disclosure of individuals´ records, therefore many local owners are reluctant to share their data due to privacy concerns. This paper presents a framework for building a Privacy-Aware Non-linear Support Vector Machine (PAN-SVM) classifier using distributed data sources. The framework with three layers can do global classification based on distributed data sources and protect individuals´ records at the same time. At the bottom layer, k-means clustering is used to select landmarks that will be used by the medium layer after they are encrypted by a secure sum protocol. The medium layer employs Nystrom low-rank approximation and kernel matrix decomposition techniques to construct a global SVM classifier which is accelerated at the top layer by employing a cutting-plane technique. Simulation results on multiple datasets indicate that the new framework can solve the classification problem on distributed data sources effectively and efficiently, and protect the privacy of individuals´ data as well.
Keywords :
approximation theory; data privacy; matrix algebra; support vector machines; Nystrom low-rank approximation; PAN-SVM classifier; assembling datasets; cutting plane technique; distributed data sources; global SVM classifier; kernel matrix decomposition techniques; multiple databases; multisource big data; privacy aware nonlinear support vector machine; secure sum protocol; Accuracy; Data models; Data privacy; Distributed databases; Kernel; Support vector machines; Training; Cutting-plane Method; Distributed data-mining; Low-rank Approximation; Matrix Decomposition; Multi-source Data; Privacy preserving; SVM;
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
Conference_Location :
Beijing
DOI :
10.1109/TrustCom.2014.103