The Fairness-Accuracy Trade-Off: A Machine Learning Regression Model Case

September 28, 2023
Authored by
Franklin Cardenoso Fernandez
Researcher at Holistic AI
The Fairness-Accuracy Trade-Off: A Machine Learning Regression Model Case

In an era where machine learning models are making significant decisions that affect people's lives, ensuring fairness in these models is not just a priority, it's a necessity. Balancing fairness with accuracy, however, is challenging without the appropriate techniques in place, especially in regression models.

In a previous blog, we explored the fairness-accuracy trade-off for classification models, a topic that looks to analyse the balance between the accuracy obtained by these models while mitigation methods are applied in order to address potential biases.

Taking this analysis as inspiration, in this blog post we will apply a similar approach to observe the trade-off phenomenon but for regression models instead.

The main motivation of this analysis is to find the configurations for which the model achieves its best performance with and without the application of mitigators and then to analyse the results to answer the question about which algorithm to use in certain tasks, or what algorithms provide the more balanced trade-off between accuracy and fairness.

A quick recap: fairness vs. accuracy trade-off

Before we start our analysis, let’s briefly recap our understanding of the fairness-accuracy trade-off. As explored in our previous blog, besides the optimal accuracy objective targeted by researchers over the years, mitigation methods have also been developed to mitigate possible bias within models. However, although the effectiveness of these techniques is to make the models fairer, most of the time this ethical improvement comes at the expense of a reduction in other metrics, such as accuracy.

Consequently, this problem posed the following question: how do we find the mitigator that provides the best balance between fairness and accuracy for a certain task?

As we presented, this gap was addressed by Haas through an interesting framework that helps to determine the more suitable model by applying optimisation techniques to find the best model configuration for the mitigators, before using a cost function that applies the metric values to determine the best set for the task.

Fairness-accuracy trade-off in regression models: a case study

Given the interesting results that we obtained from this approach in the classification tasks, we will expand its application for regression tasks.

Regression models are those intended for predicting continuous values, such as house or stock prices, for example. To illustrate the fairness-accuracy trade-off in regression models, we will consider the well-known “Communities and Crime” dataset that contains socio-economic data to predict the crime rate in communities in the United States. Then, we will follow the guidelines presented in our previous blog to perform the trade-off analysis by training and optimising the models for the presented case study.

Case study implementation

First, we will require the DEAP, Scikit-learn and the holisticai python packages for the multiobjective optimisation, training models and accuracy metric implementation, as well as the mitigators and fairness metrics implementation respectively.

Furthermore, we will follow the same pipeline:

  • Data pre-processing
  • Metrics definition
  • Regression model and mitigators selection
  • Optimisation process
  • Best model selection

Remember that you can find the complete implementation of this case study in the following link.

Data pre-processing

For our analysis, as explained, we will use the “Communities and Crime” dataset from the UCI Machine Learning repository. This is a publicly available dataset which contains socio-economic and law enforcement data for 1994 communities in the United States. This dataset contains demographic variables such as population size, race, and education level, as well as variables related to law enforcement. The objective is to predict the crime rate per capita in each community. The protected attribute we will use in this analysis is the percentage of population that is Caucasian (‘racePctWhite’)

dataset = load_us_crime(return_X_y=False, as_frame=True) 

df = pd.concat([dataset["data"], dataset["target"]], axis=1) 


df_clean, group_a, group_b = preprocess_us_crime_dataset(df, 'racePctWhite') 

X = df_clean.values[:,:-1] 

y = df_clean.values[:,-1] 

X_train,X_test,y_train,y_test, group_a_tr, group_a_ts, group_b_tr, group_b_ts = \ 

train_test_split(X, y, group_a, group_b, test_size=0.2, random_state=42) 

Metrics definition

After preprocessing the dataset with the protected group and selecting the training testing sets, we need to determine which metrics we will use to define the objective function in the optimisation process stage. Given that now we are working with regression models, a good option is to use an error measurement to assess the accuracy of the model. For our purpose, we will select the mean squared error (MSE) as the accuracy metric for this analysis.

On the other hand, in similar fashion to the previous blog, we will select the “max absolute statistical parity” for regression as the fairness metric, this function computes the maximum thresholds of the absolute statistical parity between the protected groups. The only consideration that we need to take into account is that this measurement is bounded only in 0, which is the desired value.

from holisticai.bias.metrics import max_statistical_parity 

from sklearn.metrics import mean_squared_error 

Regression model and mitigators selection

As expected, model selection varies according to the objective task. For simplicity since we are dealing with a regression problem, we will choose the Ridge regressor from the Scikit-learn package. This is a model that imposes a penalty on the coefficients by minimising a penalised residual sum of squares with a complexity parameter.

Moreover, to perform the presented trade-off analysis, we will implement two bias mitigation techniques, one for pre-processing and one for post-processing, besides the model without any kind of mitigation.

Given their fast-processing time and good results, we will implement the Correlation Remover and the Wassestein Barycenter methods.

from sklearn import linear_model 

# Preprocessing methods 

from holisticai.bias.mitigation import CorrelationRemover 

# Postprocessing methods 

from holisticai.bias.mitigation import WassersteinBarycenter 

Optimisation process

Continuing with the trade-off analysis, the next step is to perform an optimisation process by solving multiobjective optimisation with an evolutionary technique such as Genetic Algorithms (GA).

In this case, our intention is to minimise the error of the model and minimise the max statistical parity, meaning we must deal with a minimisation problem.

To make the evolutionary process as simple as possible, we only will vary some of the hyperparameters of the regression model, which will be used as the chromosomes for the GA. These hyperparameters are the penalisation parameter, the number of iterations and the solver type. Furthermore, we will leave the bias mitigator parameters in their default values.

def evaluate(individual): 


build and test a model based on the parameters in an individual and returns the mse and the fairness value 


# extract the values of the parameters from the individual chromosome 

solver = individual[0] 

alpha = individual[1] 

max_iter = individual[2] 


scaler = StandardScaler() 

X_train = scaler.fit_transform(X_train) 


model = linear_model.Ridge(solver=solver, alpha=alpha, max_iter=max_iter), y_train) 


X_test = scaler.transform(X_test) 

y_pred = model.predict(X_test) 

mse = mean_squared_error(y_test, y_pred) 

sp = max_statistical_parity(group_a_ts, group_b_ts, y_pred) 

fairness = abs(sp) 


return mse, fairness, 

With all of these guidelines defined, we can run the optimisation process to determine the best candidate for the three variants with the DEAP package by evaluating the fitness function with the chromosomes through the process. We will run the process for 20 generations with a set of 100 individuals.

creator.create("FitnessMax", base.Fitness, weights=(-1.0, -1.0)) 

creator.create("Individual", list, fitness=creator.FitnessMax) 


toolbox = base.Toolbox() 


# Possible parameter values 

toolbox.register("attr_solver", random.choice, solvers) 

toolbox.register("attr_c_param", random.uniform, alpha_lower_value, alpha_upper_value) 

toolbox.register("attr_max_iter", random.randint, lower_max_iter, upper_max_iter) 


toolbox.register("individual", tools.initCycle, creator.Individual, 

(toolbox.attr_solver, toolbox.attr_c_param, toolbox.attr_max_iter), n=N_CYCLES) 

toolbox.register("population", tools.initRepeat, list, toolbox.individual) 

toolbox.register("mate", tools.cxOnePoint) 


toolbox.register("select", tools.selTournament, tournsize=2) 

toolbox.register("evaluate", evaluate) 

population_size = 100 

crossover_probability = 0.7 

mutation_probability = 0.05 

number_of_generations = 20 

pop = toolbox.population(n=population_size) 

hof = tools.ParetoFront() 

stats = tools.Statistics(lambda ind: 

stats.register("max", np.max) 

pop, log = algorithms.eaSimple(pop, toolbox, cxpb=crossover_probability, stats = stats, 

mutpb = mutation_probability, ngen=number_of_generations, halloffame=hof, 



best_parameters = hof[0] # save the optimal set of parameters 

After completing it, we will repeat the process for the remaining models with mitigation.

Best model selection

Once we have performed the optimisation of the models, we can take the candidates and then select the model that provides us with the best trade-off between accuracy and fairness.

To do this, we will first plot the pareto frontier to observe how the best candidates for each approach are performing.

As we can observe in the previous graph, for this particular case, all methods perform differently from a practical perspective. While one of them presents better accuracy (model without mitigation), another method presents fairer results. This is an interesting result because it shows a negative correlation between these two metrics for this particular case.

Now, we will determine the best model by evaluating their metric results with the cost function proposed by Haas.

𝐶 = 𝐶𝑚𝑒𝑡𝑟𝑖𝑐1+𝐶𝑚𝑒𝑡𝑟𝑖𝑐2= 𝛼.(𝑎𝑐𝑐 𝑚𝑒𝑡𝑟𝑖𝑐)+𝛽.(𝑓𝑎𝑖𝑟 𝑚𝑒𝑡𝑟𝑖𝑐)

The following table summarizes the results after the application of the cost function for the different cases:

𝜶=𝟏, 𝜷=𝟏 𝜶=𝟑, 𝜷=𝟏 𝜶=1, 𝜷=3
Cost ROC AUC Fairness Cost ROC AUC Fairness Cost ROC AUC Fairness
Base model: Ridge Regression (RR) 0.772 0.019 0.753 0.81 0.019 0.753 2.278 0.019 0.753
RR with correlation remover 0.548 0.022 0.525 0.592 0.022 0.525 1.589 0.022 0.525
RR with Wassestein Barycenters 0.128 0.036 0.093 0.2 0.036 0.093 0.315 0.036 0.093

As we can see, the architecture that presents the best fairness-accuracy trade-off is the Ridge Regression with Wasserstein Barycenters mitigation method since it displays better results compared to the other tested architectures (notice that lower is better for this case). This conclusion is valid for all the scenarios of the cost function (equal weighting, more weight for accuracy, and more weight for fairness).

Concluding remarks

Through this tutorial, we have explained how to evaluate different approaches to determine the architecture that presents the best trade-off between accuracy and fairness for the regression case by following the framework proposed by Haas.

This framework allows the selection of different approaches and then evaluates them by defining a fitness function that contains an accuracy and a fairness metric, which is then optimised through an evolutionary algorithm (GA for our case).

The resulting candidates for best models of this optimisation are then evaluated with a cost function that combines both metrics (accuracy and fairness). It is here where the best model is finally determined.

Again, we suggest reading the original publication to find more details of the framework. In this tutorial, we have used the “max statistical parity” metric from the “holisticai” package, but feel free to experiment with the wide range of metrics that you can find in our open-source library – and the same stands for testing other accuracy metrics.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.

Subscriber to our Newsletter
Join our mailing list to receive the latest news and updates.
We’re committed to your privacy. Holistic AI uses this information to contact you about relevant information, news, and services. You may unsubscribe at anytime. Privacy Policy.

Discover how we can help your company

Schedule a call with one of our experts

Schedule a call