• DocumentCode
    2191995
  • Title

    Identifying and Preventing Data Leakage in Multi-relational Classification

  • Author

    Guo, Hongyu ; Viktor, Herna L. ; Paquet, Eric

  • Author_Institution
    Inst. for Inf. Technol., Nat. Res. Council Canada, Ottawa, ON, Canada
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    458
  • Lastpage
    465
  • Abstract
    Relational database mining, where data are mined across multiple relations, is increasingly commonplace. When considering a complex database schema, it becomes difficult to identify all possible relationships between attributes from the different relations. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. In this way, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered to be high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, which should not be divulged. In order to prevent potential privacy leakage, one may thus choose to eliminate, or distort, the income level from the database. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multi-relational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of sub schemas which maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate our method against a financial database.
  • Keywords
    data mining; data privacy; learning (artificial intelligence); pattern classification; relational databases; security of data; class attribute; complex database schema; confidential information; data leakage identifying; data mining; financial database; learning model; multirelational classification; potential privacy leakage; predictive performance; relational database mining; Multi-relational Classification; Privacy preserving data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.33
  • Filename
    5693333