CertiFair: A Framework for Certified Global Fairness of Neural Networks

Abstract

We consider the problem of whether a Neural Network (NN) model satisfies global individual fairness. Individual Fairness suggests that similar individuals with respect to a certain task are to be treated similarly by the decision model. In this work, we have two main objectives. The first is to construct a verifier which checks whether the fairness property holds for a given NN in a classification task or provide a counterexample if it is violated, i.e., the model is fair if all similar individuals are classified the same, and unfair if a pair of similar individuals are classified differently. To that end, We construct a sound and complete verifier that verifies global individual fairness properties of ReLU NN classifiers using distance-based similarity metrics. The second objective of this paper is to provide a method for training provably fair NN classifiers from unfair (biased) data. We propose a fairness loss during training that enforces fair outcomes for similar individuals, we then provide provable bounds on the fairness of the resulting NN. We run experiments on commonly used fairness datasets that are publicly available.

Publication
CertiFair: A Framework for Certified Global Fairness of Neural Networks