Author :
Kamishima, Toshihiro ; Akaho, S. ; Asoh, Hidek ; Sakuma, Jun
Abstract :
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals\´ lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer\´s two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization technique" and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Keywords :
Bayes methods; data mining; pattern classification; statistical analysis; actual fair-factorization technique; credit scoring; data classification; data mining technologies; discrimination issue; fairness issue; fairness-aware classifier independence; gender; heuristic post-processing technique; hypothetical fair-factorization; independence issue; neutrality issue; past credit data records; race; religion; statistical independence; statistical prediction techniques; two-naive-Bayes method; Data mining; Data models; Employment; Indexes; Predictive models; Privacy; Standards; discrimination-aware data mining; fairness-aware data mining; naive Bayes; privacy;