With the increasing use of machine learning models in different areas, it has become important to address the bias problem in these models. This issue can appear in different aspects such as racial, gender or socioeconomic biases leading to unfair outcomes in decision-making processes, for instance, in classification tasks, where models are trained to classify data into different categories. To address this issue, researchers have developed different strategies and techniques to mitigate the bias present in machine learning models. In this article, we explore some of the methods developed to overcome this challenge.
Machine learning's popularity has proliferated in recent years, becoming an important part of our lives and often in ways that people don’t realise. For instance, common applications of machine learning are recommender systems such as Amazon, Netflix, and Spotify using ML algorithms to analyse user’s past behaviour to suggest products, movies, or songs. Another example is regression tasks, where models are used for financial forecasting, cost estimation, and marketing. One of the most common ML tasks is “classification”, where models predict class labels, being used in a wider range of tasks, such as sentiment analysis, for a tweet or a product review for example or medical decision support, among other application areas.
While increasing the accuracy or efficacy of classification models is key, it’s equally important to ensure that these systems minimize bias against less-represented, often called sensitive or protected, groups. This commitment to fairness is especially vital for systems with outcomes that can significantly affect individuals’ lives.
One of the most prominent examples of bias occurring in an ML model is the well-known COMPAS software, used in the US by different courts to predict whether a person will re-offend or not. Although the algorithm considered various factors to generate the outcomes, later research demonstrated that the model exhibited bias against black individuals compared with white racial groups. This is not a unique case; bias has been found in several other areas such as health care or job recommendations. As such, it is vital to be aware of the ethical implications of ML and ensure that bias is not perpetuated when models are being developed.
Despite the availability of various fairness metrics like Equalized Odds, Demographic Parity, Statistical Parity, or Opportunity Equality for identifying bias in deployed models, data scientists also concentrate on devising multiple methods and strategies for reducing bias throughout the training phase. These methods are usually grouped into three categories according to the stage of training they are focused on, known as: pre-processing, in-processing, and post-processing methods. In this context, despite bias being found in many different tasks, most of the efforts focus their attention on classification problems, particularly binary classification as shown by Hort et al. who classifies the methods in the following way:
Pre-processing methods are focused on the first stages of the training process, changing or adjusting the dataset to remove bias before using it as input for an ML model, thereby the intention is to ensure fairer data to obtain a fairer model.
These methods can be grouped into the following categories: relabelling and perturbation, sampling and representation.
These two methods refer to applying some change or modification in the truth labels or the dataset features. While relabelling involves changing the truth labels and trying to balance the dataset, perturbation, on the other hand, involves adding noise or slight variations to the input data to create a more balanced representation of the dataset, helping to reduce the impact of the bias produced by the original dataset and helping the models learn to classify fairer or to be more robust to biased data.
The Disparate impact remover method for example uses perturbation to modify feature values to increase group fairness, such that the distributions of privileged and unprivileged groups are closer while preserving rank-ordering within groups. Another popular approach that uses relabelling is the one proposed by Kamiran and Calders known as “massaging” which ranks the instances of the dataset to determine the candidates for being relabelled.
Sampling methods add or remove samples from the training data to change the distribution of samples, this is known as up sampling and down sampling respectively, duplicating or generating synthetic samples for the minority group or removing instances from the majority group as done by the Synthetic Minority Over-sampling Technique (SMOTE) that combines the over-sampling and under-sampling methods to balance datasets.
Another way to sample data is by adapting the impact of the training instances by reweighing their importance according to their label and the protected attributes to ensure fairness before classification. Then the weighted instances can be used to train the ML models as usual as presented in the Reweighing algorithm, a method that instead of changing labels, weights the tuples in the training dataset to use them in any method based on frequency counts.
These methods are focused on learning new representations of the training data in order to reduce bias whilst trying to maintain as much original information as possible. One of the first algorithms using this method was Learning Fair Representation (LFR) which transforms the training data by finding a latent representation which encodes the data, minimising the information loss of non-sensitive attributes, while removing information about protected attributes. Another interesting approach is the Prejudice Free Representations (PFR) algorithm which identifies and removes discriminating features from the dataset by processing the dataset iteratively and removing information about the features that are related to the sensitive attributes. The key part is that this approach is agnostic to the learning algorithm, so that can be used for classification tasks or even, for regression tasks.
These methods are focused on acting during the training of the ML models by modifying or manipulating the algorithms to improve or increase the model fairness.
These methods can be grouped into the following categories: regularization and constraints, adversarial learning and adjusted learning.
These methods act on the loss function of the algorithm, adding an extra term to penalize discrimination, in the case of regularization, or limiting the allowed bias level according to loss functions during the training through the use of constraints concerning pre-defined protected attributes.
In methods that use regularization, we can find the Prejudice Remover technique that reduces the statistical dependence between sensitive features and the remaining information by the addition of a fairness term to the regularization parameter that avoids over-fitting.
On the other hand, Exponentiated gradient reduction technique, for example, produces a randomized classifier subject to desired constraints by reducing a binary classification model subject to the formalised version of demographic parity or equalized odds constraints to a sequence of cost-sensitive classification problems, this method is also extended into a Grid Search Reduction.
A similar approach is used by the Meta fair classifier, with the difference being that it takes the fairness constraint as input into a meta-algorithm that reduces the general problem of solving a family of linearly constrained optimization problems.
The key idea of adversarial learning is to train models that are competitors in order to improve their performance. Whilst one of the predictors tries to predict the true label, the adversary tries to exploit the fairness issue. This idea is used by the Adversarial debiasing technique, where a learner is trained to correctly predict an output variable given a certain input while remaining unbiased for a protected variable by using equality constraints.
This method attempts to mitigate bias through the development of novel algorithms by changing the learning procedure of the classical ones, for example, by using Multi-Party Computation (MPC) methods to adjust classical algorithms such as Logistic or Linear Regression to take privacy into account as presented by Kilbertus et al.
Finally, post-processing methods are applied after model training, acting on the model outcomes, they are specially util in scenarios where there is limited access to training data, or it is not possible to access the model directly. Although these methods don’t depend on the model used and don’t require access to the training process, they are less frequently found in the literature in comparison to the other two according to this survey.
These methods can be grouped into different categories, such as: input correction, classifier correction and output correction.
This technique is very similar to the pre-processing methods, the only difference being that instead of applying the modifications in the data before the training process, the modifications are applied to the testing data once the model has been trained as shown in the Gradient Feature Auditing (GFA) method, which studied the problem of auditing black box models by evaluating the influence of features of testing data on the trained model.
These methods adapt a previously trained classification model to obtain a fairer one. Linear Programming (LP) for example, optimally adjust any learned predictor to remove discrimination according to the equalized odds and equality of opportunity constraints. In this context, the Calibrated Equalized Odds method extends the previous approach for use with two binary classifiers for the privileged and unprivileged groups respectively to adjust the output probabilities with an equalized odds objective, so that this adjusted probability is returned rather than the real prediction.
This approach applies bias mitigation directly to the model outcomes, modifying the predicted labels to obtain fairer ones. One of the earlier examples that uses this key idea is the Reject Option based Classification (ROC) which exploits the low confidence region of the classifiers and assigns favourable outcomes to unprivileged groups and unfavourable outcomes to privileged groups. A more recent approach is the Randomized Threshold Optimizer algorithm that debias learned models by post-processing the predictions as a regularized optimization problem and controlling the bias with respect to a sensitive attribute by using the statistical parity metric.
In summary, we can see how a variety of techniques have been developed for mitigating bias in machine learning models, particularly in classification tasks. These methods are categorised as pre-processing, in-processing and post-processing methods, furthermore, as analysed by Hort et al., we can observe a deeper categorization according to the methodology applied to the different methods.
Written by Franklin Fernandez, Researcher at Holistic AI
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Subscribe to our newsletter!
Join our mailing list to receive the latest news and updates.