ارايه يك روش جديد انتشار داده‌ها با حفظ محرمانگي با هدف بهبود دقّت طبقه‌‌بندي روي داده‌هاي گمنام

عنوان به زبان ديگر

A New Privacy Preserving Data Publishing Technique Conserving Accuracy of Classification on Anonymized Data

پديد آورندگان

ابراهيمي آتاني، رضا دانشگاه گيلان , صادق پور، مهدي دانشگاه گيلان

تعداد صفحه

از صفحه

تا صفحه

كليدواژه

عمل‌گر فرونشاني , درخت تصميم , حفظ محرمانگي , طبقه‌‌بندي , گمنام‌سازي

چكيده فارسي

با توسعه روزافزون خدمات دولت الكترونيكي، اطلاعات شخصي افراد در قالب پايگاه‌هاي داده در دستگاه‌ها و ارگان‌هاي دولتي و خصوصي ذخيره شده است. در بسياري از موارد براي پردازش و استخراج دانش از اين منابع داده بزرگ و با‌ارزش، نياز به انتشار منابع داده و در‌اختيار‌گذاشتن اطلاعات به ساير نهادها و شركت‌ها پديد مي‌آيد كه اين امر موجب ايجاد چالش‌‌هاي امنيتي در نقض حريم خصوصي افراد مي‌شود. در اين مقاله ضمن بررسي كامل پيشينه پژوهش، حفظ محرمانگي در انتشار داده‌ها، يك روش كارآمد براي گمنام‌سازي ارائه مي‌شود كه هدف آن حفظ دقت طبقه‌بندي روي داده‌هاي گمنام است. اين روش با بهره‌گيري از درخت تصميم از انتشار اطلاعاتي كه تأثير كمي بر سودمندي داده‌هاي خروجي دارد و حذف آن‌ها موجب تأمين محرمانگي مي‌شود، جلوگيري مي‌كند. يكي از چالش‌هاي طرح‌‌هايي كه از عمل‌گر گمنام‌سازي عمومي‌سازي استفاده مي‌كنند، نيازمندي به ساخت درخت طبقه‌‌بندي براي هر شبه‌شناسه است كه بيش‌تر به‌صورت خودكار صورت مي‌گرفت. در طرح پيشنهادي نيازي به ساخت درخت طبقه‌‌بندي نيست. نتايج شبيه‌‌سازي و ارزيابي‌هاي انجام‌‌شده نشان مي‌دهد، ميان دقت الگوريتم‌هاي طبقه‌‌بندي كه روي مجموعه‌داده استاندارد گمنام‌شده توسط اين روش و مجموعه‌داده اوليه آموزش ديده‌اند، تفاوت اندكي وجود دارد.

چكيده لاتين

Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from these valuable sources, creates the need for sharing them with other organizations. This would bring security challenges in user’s privacy. The concept of privacy is described as sharing of information in a controlled way. In other words, it decides what type of personal information should be shared and which group or person can access and use it. “Privacy preserving data publishing” is a solution to ensure secrecy of sensitive information in a data set, after publishing it in a hostile environment. This process aimed to hide sensitive information and keep published data suitable for knowledge discovery techniques. Grouping data set records is a broad approach to data anonymization. This technique prevents access to sensitive attributes of a specific record by eliminating the distinction between a number of data set records. So far a large number of data publishing models and techniques have been proposed but their utility is of concern when a high privacy requirement is needed. The main goal of this paper to present a technique to improve the privacy and performance data publishing techniques. In this work first we review previous techniques of privacy preserving data publishing and then we present an efficient anonymization method which its goal is to conserve accuracy of classification on anonymized data. The attack model of this work is based on an adversary inferring a sensitive value in a published data set to as high as that of an inference based on public knowledge. Our privacy model and technique uses a decision tree to prevent publishing of information that

سال انتشار

1397

عنوان نشريه

پردازش علائم و داده ها

فايل PDF

7500379

عنوان نشريه

پردازش علائم و داده ها

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1017917