Data Science

Measuring and Mitigating Bias: Introducing Holistic AI's Open-Source Library

Authored By

Published on

March 17, 2023

Artificial intelligence (AI) is increasingly present in our lives and becoming a fundamental part of many systems and applications. However, like any technology, it is important to ensure that AI-based solutions are trustworthy and fair. That's where the Holistic AI library comes in.

The Holistic AI library is an open-source tool that contains metrics and mitigation strategies to make AI systems safer. Currently, the library offers a set of techniques to easily measure and mitigate Bias across numerous tasks and includes graphics to visualise the analysis. In the future, it will be extended to include tools for Efficacy, Robustness, Privacy and Explainability as well. This will allow a comprehensive and holistic assessment of AI systems.

The advantages of using the Holistic AI library include:

Easy to use: The library is designed to be easy to use, even for those without technical knowledge of AI.

Open-source: As an open-source library, Holistic AI is accessible to everyone and allows the community to contribute to its development and improvement.

Improving the reliability of AI systems: By using the library, you can ensure that your AI systems are reliable and fair, which is especially important in critical applications.

Holistic approach: The library allows for a comprehensive assessment of AI systems, including measures of bias, efficacy, robustness, privacy, and explainability.

In this blog post, we provide an overview of Holistic AI’s Bias analysis and mitigation framework, defining bias and how it can be mitigated before giving an overview of the bias metrics and mitigations available in the Holistic AI library.

What is Bias?

Bias in data can alter our perception of the world, leading to incomplete or inaccurate conclusions. It arises from consistent errors, such as inappropriate sampling or data collection tools, and personal beliefs that influence how we interpret results. To ensure fair and reliable data, it's crucial to detect and address bias, particularly in decision-making or machine learning.

Why is it important to measure bias?

The advances in the use of artificial intelligence systems bring numerous possibilities and challenges to the field of bias evaluation. It is necessary for end users (governments, companies, consumers, etc.) to have confidence that the results generated by this type of technology will not reproduce the prejudices and discriminatory behaviours observed in society at large, since they can be transferred to the data. Through bias metrics, we can measure whether a data set is unbalanced for a particular race, gender, sexual orientation, religious, age, salary, etc.

To demonstrate a bit of what can be developed to measure bias, let's do a case study with the UCL’s Adult dataset. This dataset is widely used in machine learning exercises and is suitable for applying bias metrics. The dataset has categorical features (Job role, education, marital-status, occupation, relationship, race, sex, and native-country) and integer features (age, years of study, capital gain, capital loss, and work hours per week). The prediction task is to determine whether a person makes over 50K a year (a classification task feature).

Two important pieces of information can be observed about this dataset. The first point is the imbalance observed between the number of men and women. The pie chart shows that 67% of observations in the dataset are men and only 33% are women, that is, one-third of the database contains information related to men. Thus, we have a clear visualisation of the participation of men and women in the dataset.

On the other hand, the comparison between the age distribution chart among people who earned more or less than 50K in the year shows that the average age of people who earned more than 50K is around 44, while the group of people who earned less than 50K has an average of 36 years old. In this sense, we can say that for the analysed dataset, people who earn more than 50K have, on average, a higher age than people who earn less than 50K. It is reasonable to imagine that older people have higher incomes associated with experience.

Figure 1: Percentage of men and women in the database and age distribution according to the classification (>50K and <=50K). — Figure 1: Percentage of men and women in the database and age distribution according to the classification
(>50K and <=50K).

It's worth noting that creating this type of visualisation with the Holistic AI library is super simple. You just need to use the group_pie_plot and histogram_plot functions.


fig, axes = plt.subplots(nrows = 1, ncols =2, figsize=(15,5)) 

group_pie_plot(df['sex'], ax = axes[0]) 

histogram_plot(df['age'], df['class'], ax = axes[1]) 

plt.tight_layout()

In addition, we can analyse bias in the dataset in a simple and objective way. For example, we can generate results for five bias metrics (you can learn more about bias metrics in this Roadmaps for Risk Mitigation) with just three lines of code and thus measure whether the predictions made by a Machine Learning model have gender biases. In this case, we use the classification_bias_metrics function, but there are functions in the Holistic AI library for various other problems. In this case, the Four Fifths Rule metric lesser than 0.8 indicate a higher bias in favour of group_a.


group_a = np.array(X_test['sex']==1)

group_b = np.array(X_test['sex']==0)

classification_bias_metrics(group_a, group_b, y_pred, metric_type='equal_outcome')

What can we do to mitigate bias in AI?

Bias in AI can be addressed at different stages of the model life cycle, and the choice of mitigation strategy depends on factors such as data access and model parameters. The Holistic AI library offers three approaches for mitigating bias: pre-processing, in-processing, and post-processing. Pre-processing approaches transform the data before it is fed into the model, in-processing modifies the algorithm without changing the input data, and post-processing adjusts the outputs of the model.

These strategies help to improve the fairness, and trustworthiness of AI systems and can be applied to a variety of model types such as binary classification, multiclassification, regression, clustering, and recommender systems. For example, a pre-processing approach known as reweighing adjusts the importance of datapoints to mitigate bias. On the other hand, adversarial training can be used in-processing to adjust predictors associated with bias, and calibration can be used post-processing to ensure that positive outcomes are more evenly distributed across subgroups. An overview of the mitigation strategies in the Holistic AI library and the models they are suitable for can be seen in the table below.

For instance, for the Adult dataset, if we are at a stage where we have access to the training data set, we can employ reweighing pre-processing to guide the model training or we can conduct a more exploratory search into the importance of each example using in-processing Grid Search. And if retraining the model is not an option, post-processing techniques like Calibrated Equalized Odds can still be used to improve fairness. The best part? With the Holistic AI library, testing out these variants can be done with minimal lines of code - making it super simple to use.

This allows us to perform rapid analyses and even experiment with integrating pre- and post-processing strategies in the same pipeline. Then, we can compare all our results using the Holistic AI metric functions:

We can go deeper and try more strategies, testing better parameters settings and then visualise our results. Holistic AI has several visualisation methods to improve your analysis and bias mitigation results. For the Adult Dataset (a classification problem), using pre-processing reweighing, below are some of the visualisations you can create using the Holistic AI library.

The visualisations you can create using the Holistic AI library — *Figure 2: A set of visualisation graphs generated using the Holistic AI library to depict mitigation bias. These graphs represent different aspects of bias in data.*

You can find the full tutorial for this here.

Check it out

Holistic AI’s library is a valuable tool for ensuring the reliability and fairness of AI systems. With its easy-to-use interface and graphics to analyse bias, the library offers a comprehensive approach to AI assessment. If you are interested in ensuring the quality of your AI-based solutions, you should look at the Holistic AI library.

Appendix

HAI Bias Metrics

There are several metrics that can be used to measure bias depending on the type of model being used. The Holistic AI library offers a range of metrics that are suitable for these different systems, as can be seen in the table below.

Binary Classification	Multi Classification	Regression	Clustering	Recommender System
Area Between ROC Curves	Multiclass Accuracy Matrix	Average Score Difference	Cluster Balance	Aggregate Diversity
Accuracy Difference	Confusion Matrix	Correlation difference	Minority Cluster Distribution Entropy	Average f1 ratio
Average Odds Difference	Confusion Tensor	Disparate Impact quantile	Cluster Distribution KL	Average precision ratio
Classification bias metrics batch computation	Frequency Matrix	MAE ratio	Cluster Distribution Total Variation	Average recall ratio
Cohen D	Multiclass Average Odds	Max absolute statistical parity	Clustering bias metrics batch computation	Average Recommendation Popularity
Disparate Impact	Multiclass bias metrics batch computation	No disparate impact level	Minimum Cluster Ratio	Exposure Entropy
Equality of opportunity difference	Multiclass Equality of Opportunity	Regression bias metrics batch computation	Silhouette Difference	Exposure KL Divergence
False negative rate difference	Multiclass statistical parity	RMSE ratio	Social Fairness Ratio	AExposure Total Variation
False positive rate difference	Multiclass True Rates	Statistical parity (AUC)		GINI index
Four Fifths	Multiclass Precision Matrix	Statistical Parity quantile		Mean Absolute Deviation
Statistical parity	Multiclass Recall Matrix	ZScore Difference		Recommender bias metrics batch computation
True negative rate difference				Recommender MAE ratio
Z Test (Difference)				Recommender RMSE ratio
Z Test (Ratio)

HAI Mitigation Strategies

There are several strategies that can be used to mitigate bias depending on the type of model being used:

Type Strategy	Method name	Binary Classification	Multi Classification	Regression	Clustering	Recommender System
Pre-processing	Correlation Remover	✖	✖	✖		✖
	Disparate Impact Remover	✖	✖	✖	✖	✖
	Learning Fair Representation	✖
	Reweighing	✖	✖
	Fair Clustering				✖
In-processing	Adversarial Debiasing	✖
	Exponentiated Gradient	✖		✖
	Fair K Center Clustering				✖
	Fair K Median Clustering				✖
	Fair Scoring Classifier		✖
	Fairlet Clustering				✖
	Grid Search	✖		✖
	Debiasing Learning					✖
	Blind Spot Aware					✖
	Popularity Propensity					✖
	Meta Fair Classifier	✖
	Prejudice Remover	✖
	Two Sided Fairness					✖
	Variational Fair Clustering				✖
Post-processing	Debiasing Exposure					✖
	Fair Top-K					✖
	LP Debiaser	✖	✖
	MCMF Clustering				✖
	ML Debiaser	✖	✖
	Plugin Estimator and Recalibration			✖
	Wasserstein Barycenters			✖
	Calibrated Equalized Odds	✖
	Equalized Odds	✖
	Reject Option Classification	✖