DocumentCode
1799821
Title
Privacy Aware Non-linear Support Vector Machine for Multi-source Big Data
Author
Yunmei Lu ; Phoungphol, Piyaphol ; Yanqing Zhang
Author_Institution
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
fYear
2014
fDate
24-26 Sept. 2014
Firstpage
783
Lastpage
789
Abstract
In order to build reliable prediction models and attain high classification accuracy, assembling datasets from multiple databases maintained by different sources (such as different hospitals) has become increasingly common. However, assembling these composite datasets involves the disclosure of individuals´ records, therefore many local owners are reluctant to share their data due to privacy concerns. This paper presents a framework for building a Privacy-Aware Non-linear Support Vector Machine (PAN-SVM) classifier using distributed data sources. The framework with three layers can do global classification based on distributed data sources and protect individuals´ records at the same time. At the bottom layer, k-means clustering is used to select landmarks that will be used by the medium layer after they are encrypted by a secure sum protocol. The medium layer employs Nystrom low-rank approximation and kernel matrix decomposition techniques to construct a global SVM classifier which is accelerated at the top layer by employing a cutting-plane technique. Simulation results on multiple datasets indicate that the new framework can solve the classification problem on distributed data sources effectively and efficiently, and protect the privacy of individuals´ data as well.
Keywords
approximation theory; data privacy; matrix algebra; support vector machines; Nystrom low-rank approximation; PAN-SVM classifier; assembling datasets; cutting plane technique; distributed data sources; global SVM classifier; kernel matrix decomposition techniques; multiple databases; multisource big data; privacy aware nonlinear support vector machine; secure sum protocol; Accuracy; Data models; Data privacy; Distributed databases; Kernel; Support vector machines; Training; Cutting-plane Method; Distributed data-mining; Low-rank Approximation; Matrix Decomposition; Multi-source Data; Privacy preserving; SVM;
fLanguage
English
Publisher
ieee
Conference_Titel
Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
Conference_Location
Beijing
Type
conf
DOI
10.1109/TrustCom.2014.103
Filename
7011327
Link To Document