Data Science

Visualising Bias Metrics: Insights from Holistic AI's Open-Source Library

Authored By

Published on

August 21, 2023

Visualising information is an integral part of the decision-making process. When we visualise a dataset, our aim is to make the information clear and objective, facilitating a more sophisticated understanding and adding value to the information.

Numerous cases throughout history demonstrate the importance of reports that present data accurately. One of the most emblematic cases is related to the 1986 Challenger space shuttle accident. In the book Visual Explanations, author Edward Tufte describes certain misconceptions in data interpretation that could have been resolved with the use of more precise visualisations.

Unveiling bias metrics through the open-source Holistic AI Library

With the increasing use of artificial intelligence-based models and the growing need to validate the results generated by these models, there is growing need for the creation of visualisations that aid in understanding the model itself, as well as its decision-making process.

The purpose of this blog post is to demonstrate how to generate graphs for the results of bias metrics calculated through the Holistic AI Library, an open-source resource for improving the trustworthiness of AI systems.

For our example, we will use a regression task for machine learning models.

Report for regression tasks

In this example, we'll tackle a regression challenge using the Adult Dataset. This well-known dataset is available in the Holistic AI Library and is widely used for conducting analyses with machine learning models.

Below is the code used to generate the visualisations.


# base imports 
import pandas as pd 
import numpy as np 
 
# Report Plots 
from holisticai.bias.plots import bias_metrics_report 
 

# import datasets 
from holisticai.datasets import load_adult 
 
df = load_adult()['frame'] 
# import bias metrics  

from holisticai.bias.metrics import regression_bias_metrics  

from sklearn.model_selection import train_test_split  

from sklearn.preprocessing import OneHotEncoder, StandardScaler 

from sklearn.linear_model import LinearRegression  

 

x = df[['capital-gain', 'capital-loss', 'hours-per-week']]  

encoder = OneHotEncoder() 

enc = encoder.fit_transform(df['sex'].array.reshape(-1,1)) 

enc = pd.DataFrame(enc.toarray(), columns = ['sex_male', 'sex_female']) 

x_t = pd.concat([x, enc], axis=1) 

 

scaler = StandardScaler() 

x_scaled = scaler.fit_transform(x_t) 

x_scaled = pd.DataFrame(x_scaled, columns = [x_t.columns]) 

y = df['fnlwgt']  

y = scaler.fit_transform(y.array.reshape(-1, 1)) 

 

x_train, x_test, y_train, y_test = train_test_split(x_t, y, test_size = 0.3, random_state = 0)  

 

group_a = x_test['sex_male'] 

group_b = x_test['sex_female'] 
 

model= LinearRegression()  

model.fit(x_train, y_train)   

y_pred = model.predict(x_test)  

y_true  = y_test  

 

from holisticai.bias.metrics import regression_bias_metrics 

y_true = y_test 

metrics = regression_bias_metrics(group_a, group_b, y_pred, y_true, metric_type = 'both')

Baseline metrics (without mitigation)

We can observe the results table for the baseline models (without mitigation strategy). It is noticeable that the table is useful for evaluating the model, but with a plot, the interpretation tends to be improved.


bias_metrics_report('regression', metrics)

Report for bias mitigation strategy

For our example, we apply a preprocessing mitigation strategy called Correlation Remover. This algorithm modifies the original dataset by eliminating any correlations with sensitive values. This is achieved by applying a linear transformation to the non-sensitive feature columns of the dataset.

The implementation of the mitigation strategy is described below.


# generate plot with mitigation Correlation Remover 

from holisticai.bias.mitigation import CorrelationRemover 

corr = CorrelationRemover() 

test = corr.fit_transform(x_test, group_a, group_b) 

y_pred_mitigated = model.predict(test) 

metrics_mitigated = regression_bias_metrics(group_a, group_b, y_pred_mitigated, y_true, metric_type = 'both') 

metrics mitigated


# plot report of bias and mitigated outputs 

bias_metrics_report('regression', metrics, metrics_mitigated)

As we can observe, the graphs aid in visualising the mitigation results. By observing metrics such as Z-Score Difference, RMSE Ratio Q80, and MAE Ratio Q80, it becomes clear that the mitigation strategy successfully enhanced the fairness aspects of the model's prediction.

The generated plots enhance the understanding and aid an accurate interpretation of the model's performance. By visually representing the outcomes of bias measurements, these plots provide valuable insights into the model's behaviour, its potential strengths, and areas for improvement. They serve as a compass for decision-makers, guiding them toward a more comprehensive and informed evaluation of the model's fairness and effectiveness.

Explore the Holistic AI Library

The Holistic AI Li```-brary is an open-source resource designed to elevate the trustworthiness of AI systems. It provides an array of techniques tailored to measure and combat bias across diverse tasks.

It encompasses techniques across five key risk areas in total: Bias, Efficacy, Robustness, Privacy, and Explainability. The broad spectrum of tools supplied within the library enables the comprehensive assessment of AI systems and applications, providing a platform for transparent and reliable AI.

Heading 2

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.